Many websites use Google Analytics to track visitor data and action within the site. However, as websites grow bigger, Google Analytics (GA) must retrieve large amounts of information. Many webmasters run into problems related to sampling and other factors. The following tips will help you get the best service from your GA reports and hence improve your business.
1. Eliminate sampling errors by slicing data into small ranges
GA only returns a maximum of 500,000 data rows for most queries, with some exceptions. Therefore, if you run a query whose result takes more than the 500,000 rows, GA will pick a sample of 500,000 from the data and then multiply up to give the original number. Suppose your website had 4 million new visitors in one month, and you request a report of where the visitors visited from in that month. GA will sample 12.5% of your visitor data, analyze their sources and then multiply the resulting figures by 8 for the report.
This is a reasonable and common statistical technique that reduces data processing time. But, the danger with extrapolating in this manner is that reports may not always be accurate. In some cases, they may be altogether misleading.
In order to get accurate data, slice your data into sample ranges that allows GA to analyze within its range without sampling. For the example above, you may assume that you got 500,000 visitors every 4-5 days. In this case, make 6-7 queries for that number of days and then stitch the findings back together. The results will be accurate since they were taken as is.
2. Figure out how to operate the API
The GA web interface is a powerful and useful way of data exploration. You can easily get your reports and share with relevant colleagues for decision-making. However, it can be time-consuming and arduous to slice data as described above, download the results and then piece back together yourself. Not to mention the risk of introducing human error!
An easier way to do this is by applying the Core Reporting API. You can use the Query Explorer, which is an easier and friendlier way to construct multiple queries and piece them back together. You can use Python or other programming languages to grab the data. The entire process is hassle-free once you learn.
3. Have an actual database to store your GA data
Given that you’ll have to stitch your data slices back together locally, it helps to have a central location to store information. It will keep your data in an orderly way and let you use them as required in future. You can use a remote database expert to modify your website database by adding some tables to store your GA data. This is helpful since traffic data can be paired to relevant articles rather than URLs, given that URLs may change as time passes.
Additionally, doing this can help you run more complex article-related queries, e.g. the 10 most viewed articles in a given category for the past month, or a list of every long-term author’s 5 most popular posts (setting parameters for what constitutes long-term). These queries are usually difficult to run because GA doesn’t understand ‘author’ or ‘category’ as you may think. The other upside is that it’s easier to export data from one database if using third-party visualization and external analysis tools such as Tableau.
4. Adequately utilize your Views options
GA allows up to 10,000 API requests a day for each reporting view. If you have more than 10,000 posts across your website, getting information for each post in a single view will take more than one day, depending on how many they are.
What many don’t know is that views live within properties, and each property is allowed up to 50,000 requests. Therefore, you can set up your properties with a related set of views e.g. traffic info for one topic/author, as well as a view with traffic info for the site in its entirety. You can now query Topic views for data on a specific topic’s posts and hence use the 50,000 API limit rather than being limited to 10,000.
5. First retrieve broad data then narrow down
A lot of work can go into creating a GA report. Suppose you want to create a map displaying the sources of traffic by country to certain articles in your site. If you have 250 posts for a six-month period (about 100 day), you’ll need to run 25,000 queries to cover each post on each day. Consider that some posts will probably have no views on some days, and GA doesn’t sample ‘overall pageviews’ while it samples ‘pageviews-by-country’.
Instead, you can make one query for every post, requesting overall pageviews per day. Then, you can make another query for every post and for every day each post had at least one pageview, asking for pageviews-by-country. This will massively reduce the number of separate queries you’ll have to send, speeding up the process and saving that precious GA API request quota.
6. Apply multithreading
You’ll notice that GA takes about 5-7 seconds to publish the results of one query. If you have to make a request on every single post, assuming your site has thousands of posts, you’ll be making requests for days. Rather than run each query by itself, consider parallel querying. You can send up to 10 API requests per second, meaning in the time taken to process your initial queries you can send at least 50 more. This allows long query queues to have much faster processing times compared with serial querying, so that the work that would have taken 24 hours can now take just one or two hours.
7. Save, save, save
Finally, there are many things that can go wrong as you work: code may crash, you may drop your connection, use up your GA API quota or anything else, when you’re at the 95% point of downloading your data. You don’t want to have to start afresh once the problem passes. Therefore, ensure that your data is being saved to disk as it downloads rather than being cached in your memory. Make your settings such that you can skip to where you left off once the download resumes.
How have you manipulated the data you get from Google Analytics to increase your business and website power? Share your experiences in the comments below.