Once painstakingly collected, data is most value when actionable steps are taken from it. Methods of safely distributing and communicating results from data are not all created equally. Dashboards are a safe and effective way to communicate results from data.
We’ll walk through an example of how to create a dashboard in R using flexdashboard, an R framework for creating dashboards with R and Markdown.
For this example we’ll grab data from Google Analytics and use it to make 3 charts on a dashboard.
A time series chart
To get this all to work, we’ll need to wrap our code in a flexdashboard template. You can find out how to do that here. We’ll use the row template to get our charts displaying well on the page and allow for vertical scrolling.
For chart 1 we’ll use DT, a data table library to create a searchable and sortable table for our dashboard. I used data from the in-market segment in Google Analytics. Plotting the chart with default stylings is easy:
Chart 2 will be our histogram. The code for creating a histogram in Highcharter is very simple as well:
This should plot a histogram of traffic to my website from May 2015 through May 2016. This chart is also zoomable, which is a nice feature to get for free.
Chart 3 is a time series chart. I will also use Highcharter for this as well:
This gives us visits by time over the same range.
From here you can click “Knit” in Rstudio and it will compile this markdown document into HTML. You will have a working, interactive and responsive dashboard. I’ve published a working version to rPubs for reference here.
You love sports. You love data. If you’ve ever gone on an epic journey in search of sports data, you’ve probably resorted to scraping data from sites like ESPN or Baseball Reference and have spent countless hours writing Python code to use the powerful web scraping library BeautifulSoup. Maybe you even have a nice Python client that scrapes stats.nba.com or you wrote an R package to scrape Baseball Reference.
However, we all know that websites get redesigned, formats change and sometimes access to specific stats is revoked (e.g. the NBA did this with player movement tracking but attributes it to ‘technical difficulties’). Suddenly, you are scrambling to update your scraping code in order to account for a few new divs or other elements on the web page that were renamed.
This is where the Stattleship API comes to the rescue. We’ve built an easy-to-access set of sports data, stats and accomplishments for multiple sports (and expanding). We’ve partnered with Gracenote to provide all of our NFL, NBA, NHL and MLB game data. Our service is designed for creative fans who want to use Stattleship data to build sports apps that scale.
We’ve cleaned, structured, and categorized all of the box score, player stats and game log data for you. We’ve even quantified specific performances and feats so that you can quickly identify whether a stat is a common-place occurrence or a record-breaking achievement.
There are a few main endpoints that we will focus on. They are as follows:
games - contains game information such as date, attendance, scoreline, etc.
players- contains player information such as name, position, draft round, school, salary, weight, birthday, etc.
teams - contains team information such as name, division, colors, hashtags, etc.
game_logs - contains player-level game log information such as Kevin Durant’s assists from a specific game, total minutes played, etc.
team_game_logs - contains team-level game log information such as total points, rebounds and turnovers by the Boston Celtics in a specific game
Here’s a basic Entity-Relationship Diagram outlining how these objects relate to one another:
In order to get started, sign up for a free API token from here. Now load up RStudio and start following along! The first few lines of this R script will install the R package from Github to your machine:
(If you don’t have devtools installed you will have to install that first.)
Next you need to set your API token in the R environment:
Now we need to specify the sport, league and endpoint we are interested in. In this case we will fetch all MLB regular season game logs for the Red Sox to date. This is as easy as setting three parameters: sport, league and ep for endpoint.
This last parameter, called q_body is where we can set more granular options such as requesting a specific team, stat, player or season.
Now that all of our parameters are set, we can use ss_get_result to send our request to the Stattleship API. Notice the walk=TRUE option. This ensures that the request will walk through all pages of results and return everything. Results are returned 40 rows per page.
We now have a list in R that has 5 elements (there were 5 pages of results returned).
We want to combine all of the pages of results into one data.frame:
Let’s check out all of the data we now have access to from yesterday’s games:
Wow, over 80 variables for each player including strikeouts, walks, doubles, runs, and more.
I want to include player information into this data set though so I have more than just player_ids. Let’s retrieve all Boston Red Sox players by changing the ep to players and pass team_id='mlb-bos into the options list.
In order to merge the two data.frames we can simply rename the id column to player_id so we can use merge in R like this:
Now let’s do some basic player calculations using dplyr.
Good! Now let’s plot it all using ggplot2.
This plot actually enables us to visualize 4 different variables at once. The x and y axes display total runs and mean batting average, the size of the labels indicate how many total bases the player has had, and the color indicates salary. Xander Bogaerts and Travis Shaw are looking like quite a bargain at this point in the season already.
Let us know if you build anything cool with this awesome sports data API! We also welcome contributions to our R wrapper via pull requests in Github. We have a public Slack channel as well where you can join us to talk sports, data, get R or Stattleship API help, and provide feedback.