For an organization that is geographically distributed, understanding how interactions are distributed across the geography becomes important. Presenting that information in a simple and effective manner is best.
We’ll walk through a simple example of how to visualize your state data from Google Analytics (any other analytics solution would be similar).
Within the Google Analytics interface, you can find a visualization that looks like this:
It’s handy for understanding traffic and other engagement patterns from a geographical perspective. We can simplify it a bit an add some depth using some statebins.
The Statebin comes from the Washington Post:
We’ll take the same data from this graph and make it a statebin!
Step one is preparing the data. All you need is a state column and another column (in this case percentage new visitors). Data should look like this:
From here we’ll use R. First thing we need to do is translate the state data into state abbreviations. For example, “New York” would become “NY” and so on. R already has a handy function for this:
From here, we just need to adjust our options and make a plot:
*No prior API or programming knowledge is necessary to complete this tutorial.
How to create a standalone Adobe Analytics real-time dashboard using the API?
Adobe has a nice real-time dashboard built directly into Reports & Analytics (SiteCatalyst). It is easy to configure and get up an running quickly. But how do you create your own real-time dashboard outside of Adobe Analytics using the API?
Use the lessons from an Adobe Summit lab session to walk-though how to create your own Adobe Analytics real-time dashboard. The dashboard uses the Adobe Analytics real-time API and D3.js for data visualization. When you are done with this tutorial your dashboard will show your company’s data and look like the sample below.
####Create a Snazzy Real-Time Adobe Analytics Dashboard Now
1) Update the config.js file in the analytics-realtime-dashboard-example-master that you downloaded above. The path to the file on my computer is: C:\Users\ryanpraski\Desktop\analytics-realtime-dashboard-example-master\js\config.js
Below is a sample config.js file. Make sure to update it with your credentials and save it on your computer.
2) Find the Lesson 6 folder on your computer. The path to the folder on my computer is: C:\Users\ryanpraski\Desktop\analytics-realtime-dashboard-example-master\lessons\lesson_6
3) Open the README file in the lesson_6 folder in your text editor or on GitHub.
4) Open the lesson_6.html file in your web browser and then open it a second time in a text editor. The path to the folder on my computer is: C:\Users\ryanpraski\Desktop\analytics-realtime-dashboard-example-master\lessons\lesson_6\lesson_6.html
5) When you load the lesson_6.html file in your browser you should see an animated number flash on the screen of total page views for the past 15 minutes in the top left, a continuously updating trend line with page views by minute for the past 15 minutes, and a table with the top 10 pages by page views for the last 15 minutes.
6) Follow the instructions in the lesson_6 README file to add the donut chart to your dashboard. If you have trouble getting the the donut chart included on your lesson_6.html file, you can open a complete version of the dashboard with the Lesson 6 code updates already done for you. The path to this complete “final dashboard” file on my computer is: C:\Users\ryanpraski\Desktop\analytics-realtime-dashboard-example-master\index.html
7) Your real-time dashboard should look like the screen shot below and will dynamically update with real-time Adobe Analytics data from the reporting API.
8) If you’d like to create a different report that uses the same visualizations change the query parameters in the lesson_6.html or index.html file within the var params show in the code snippet below:
More often than not, analysts are given data that is unstructured and dirty. In this example, an analyst has been provided a 300MB Excel Workbook containing sales transactions, however it has been noted that several of the transactions have been duplicated in the data. Before doing any analysis of the sales data, the duplicate data needs to be cleaned up.
Our sample dataset contains every order transaction for 2015. The data is structured in such a way that each item purchased, in an order, is a unique row in the data. If an order contained three unique product SKUs, that one order would have three rows in the dataset.
We need to identify orders that contain duplicated order line items, those duplicates would have a SKU that appears in more than one row of data, for a given order_id, as shown below.
Notes about the data:
order_id: A unique id that serves as the key to group line items into a single order
order_item_cd: Product SKU
order_item_quantity: The number each SKU purchased for an order
order_item_cost_price: The individual unit price of a SKU purchased
Let’s begin begin by importing the pandas library. pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python.
Next, we will use pandas to import all of the sales data from our Excel Workbook into a pandas DataFrame. The pandas DataFrame, along with Series, is one of the most important data structures you will use as a data analyst. Creating a DataFrame is one of the first things we typically do after launching Python.
A DataFrame is tabular in nature and has a ?spreadsheet like? data structure containing an ordered collection of columns and rows.
We have our data in a DataFrame, which resembles the structure shown in the example sales data above, now we need a way to identify all of the duplicate rows in our dataset. To accomplish this we are going to add a new column to our DataFrame, named is_duplicated, that will hold a boolean value identifying if the row is a duplicate or not. To populate the boolean value for each row in the dataset, we are going to use a very powerful function called duplicated which we will feed the columns we want to evaluate for duplication. In this case, we will mark a row as a duplicate if we find more than one row that has the same order_id and order_item_sku.
Before we go any further, let's get a sense of how big of an issue we have here. We are going to sum up all of the rows that were marked as a duplicate. This will give us the number of duplicate line items in the dataset.
Ok, so this isn’t just a few records here and there that are a problem, our dataset contains 13,372 line items that have been marked as being duplicates. Before we look to do any cleanup, let’s further understand the impact of this dirty data. Let’s find out how many duplicate units are in our dataset, we would expect at least 13,372 units but there is a high likelihood that customers often purchase more than one unit of any given SKU.
Like we did in the previous step, let’s sum up the number of items purchased that were marked as being duplicates in our dataset.
Let’s see what the impact to revenue is. In order to get the impacted revenue, we will add a new column to our DataFrame which will be the line item total for each row in the dataset. To get this value, we will multiple quantity purchased by the item cost, for each row.
Once we have the total for each line item, we can again sum all of the duplicated line items, this time using our revenue value.
Let's clean the duplicate data up.
It’s always a good practice to constantly check our work. Let’s do a quick sanity check to make sure we have gotten rid of the problematic duplicate data.
Now we can export this cleaned up dataset so that we can do additional analysis of our 2015 sales data without having to remove all the duplicate data again.