Tracking PDFs And Other Downloads Inside Google Analytics… Server‑Side!
Google Analytics is great for tracking just about anything – inside a webpage. Google’s JavaScript code sits nicely on your website’s HTML pages, and tracks all of your site’s pageviews, visitor session information and various user interactions and events (including resource downloads from links, with some minor tweaking).
The thing about JavaScript though is that it needs to live somewhere on the visitor’s browser (client-side), tucked neatly inside the <head> tag. GA can only track resources that are downloaded when JavaScript is involved.
This is all fine and dandy – except that the world doesn’t always work that way. People sometimes hotlink to PDFs, Word docs and images and visit them directly. And thank goodness! Can you imagine a world without direct links to imgur.com memes?
In situations where visitors access a non-HTML resource directly, Google Analytics is not the tool for the job. An analyst would have to view raw server logs to determine how many times a PDF was requested. For example, we determined that nearly half of a particular client’s PDFs were downloaded directly from an email blast campaign. Since there was no visit to the website involved, Google Analytics was clueless. Yet important data was sitting idle inside server logs, buried and inaccessible.
Enter server-side Google Analytics. Note: This is a PHP-only solution. Conceptually this can work in other environments, but PHP/Apache is my flavor of choice.
Thomas Bachem and others at United Prototype built a library for server-side Google Analytics called php-ga. Their code allows your PHP server to hit Google Analytics by simulating a JavaScript request. You’ll see what that means in a sec.
By integrating with this library, we can 1) set an .htaccess rule to reroute all PDF downloads through 2) a custom download.php script, which hits the library and 3) fires off a Google Analytics call. You can keep your same folder structure and you won’t need to move any of your existing PDFs! And no additional cookies required! Let’s dig in.
Tools You Need:
- Apache server with PHP 5.3 or greater
- Notepad++, TextMate or a similar quality text editor
- FTP Client like FileZilla
- Google Analytics account
A Cautionary Word:
I would strongly recommend you use a new Google Analytics property for this. Why? Because the GA session data doesn’t fully match up from the client-side (JavaScript) to the server-side (PHP). Please play it safe and use another property. Otherwise, you’ll have a whole lot of New Visitors and you’ll wonder why! Don’t say I didn’t tell you so!?
Step 1.
Download the php-ga library. Look inside the folder labeled src and move autoload.php and the GoogleAnalytics folder to your website’s root directory.
Step 2.
Create a new PHP file called download.php. This is where the magic happens. The script 1) loads up the php-ga library, 2) creates a new visitor hit to GA, 3) tracks a virtual pageview for the PDF, 4) uses cURL to set a custom user-agent called LunaMetrics123 (you’ll see why later on), and 5) fetches the PDF and sends it to the browser.
setIpAddress($_SERVER['REMOTE_ADDR']); $visitor->setUserAgent($_SERVER['HTTP_USER_AGENT']); $visitor->fromUtma($_COOKIE['__utma']); //$visitor->setScreenResolution('1480x1200'); // Assemble Session information $session = new GoogleAnalytics\Session(); $session->fromUtmb($_COOKIE['__utmb']); // Get filename from the previous request $filename = parse_url(urldecode($_SERVER['REQUEST_URI']), PHP_URL_PATH); //$filetype = preg_replace("/.+\.(.+)/i","$1",$filename); // Assemble Page information $page = new GoogleAnalytics\Page($filename); $page->setTitle($filename); $page->setReferrer($_SERVER['HTTP_REFERER']); // Track page view $tracker->trackPageview($page, $session, $visitor); // Create the URL for the PDF $protocol = ((!empty($_SERVER['HTTPS']) && $_SERVER['HTTPS'] != 'off') || $_SERVER['SERVER_PORT'] == 443) ? "https://" : "http://"; $url = $protocol.$_SERVER['HTTP_HOST'].$filename; // Fetch the PDF (cURL it) $ch = curl_init($url); // This creates a user-agent string that we set .htaccess to ignore (preventing an endless loop) curl_setopt($ch, CURLOPT_USERAGENT, "LunaMetrics123"); $data = curl_exec($ch); curl_close($ch); // For good measure exit; ?>
Step 3.
In the root of your website, open the .htaccess file. This is a special system file that may be hidden on some machines. You may need to create a new blank .htaccess file. Don’t forget the leading dot!
(If working locally on Mac OS X, you may not see hidden system files by default. This link can fix that.)
.htaccess provides instructions for the Apache server when a visitor tries to access files. Our job then is to intercept the request for a PDF and reroute it to download.php. Remember the LunaMetrics123 custom user-agent string earlier? That’s a handy hack to make sure that the server doesn’t enter an endless loop… without setting a cookie!
Inside .htaccess, include the following lines:
# Use PHP 5.3 by default (HostGator or other shared hosts might require this line) AddType application/x-httpd-php53 .php # Turn on RewriteEngine (it might already be on for you) RewriteEngine On # If the user-agent string is NOT set by our PHP script... RewriteCond %{HTTP_USER_AGENT} !LunaMetrics123 [NC] # ... rewrite all requests for PDFs to our PHP script RewriteRule ^.+\.pdf$ /download\.php [L,NC]
That’s it!
Once your code is in place, go to your website and try to download a PDF… directly. Don’t visit the website first. If you don’t have a PDF handy, here’s one.
Then… drumroll. Open up your Google Analytics and view the Real-Time reports.
Take a look! You’ll see that you should have an active page (the PDF):
This is just a start. There are so many things we can do with server-side Google Analytics. Why stop at PDF downloads? We could track direct image downloads, Word documents, 404 error pages, etc. Share some other uses in the comments below!