Directmonster.js: A Solution For Dark Social

October 2, 2013

direct-monster-banner

Back in October of 2012, Alexis Madrigal of The Atlantic wrote an article titled Dark Social: We Have the Whole History of the Web Wrong. If you’re anything like me, you read it with great excitement – Direct traffic is a huge problem, and one that most people don’t even realize they have.

First, a quick recap: Direct Traffic visits are meant to represent visits where a visitor did one of the following things:

1.) They entered in the URL directly into the toolbar
2.) They bookmarked your site and used the bookmark to visit after their UTMZ cookie had expired (6 months)

But really, Direct traffic is just traffic that comes to your site without a ‘document.referrer’. This means that there are a variety of situations where someone is being referred by a shared link, and not visiting, well, Direct. Some examples:

1.) Someone shared the URL with them via IM or email, and they clicked it

2.) iOS 6 misbehaving (this was recently fixed, as you probably noticed)
3.) Analytics implementation & configuration issues
4.) Click-throughs from social media apps like Facebook
5.) And quite a bit more; (Jim Gianoglio has an exhaustive piece on potential sources).

Of course, it’s not even that simple. In Google Analytics, direct traffic can behave differently depending on a few factors. If a direct visitor is a returning visitor, the most recent referrer information that Google Analytics has on them will ‘fall through’ to their present visit. If someone came to your site through Google/Organic, left, then returned by manually typing in the URL the next day, they would appear as a single visitor with two visits, both from Google/Organic. In fact, as long as they come back to the site before the UTMZ cookie that Analytics sets expires, they can cause their referring information to ‘fall through’ over and over. The idea is that it’s better to give credit to the most recent source rather than nothing at all; after all, even an ‘assisting referrer’ is actionable information, whereas ‘Direct / None’ isn’t very helpful. This is part of the reason that there are sometimes discrepancies between visits in Google Analytics and clicks in AdWords, Facebook Ads, or even just Bit.ly.

This has been a particular pain point for LunaMetrics. A big part of what we do here are our Google Analytics & Google AdWords seminars. Because attendees are usually sponsored by their employers, we run into some challenging attribution problems. For example, someone could be visiting an event calendar, see our seminars, click through, and decide they wanted to attend. If they make the purchase, the conversion will be attributed to ‘Calendar / Listing’, but if they email the link to procurement and they make the purchase, it will be attributed to ‘Direct / None’; adding insult to injury, we also lose all the cross-channel visit history that we might have had for the interested employee. Any AdWords ads they might have clicked, any display they might have viewed, and so on; all lost, and with it our ability to correctly determine the effectiveness of our marketing. You can see why this could be problematic, I’m sure.

Currently, the best solution to mitigate Direct traffic is to add campaign parameters to your links. Many of you might be familiar with these already, either from personal use or from seeing them on the web: the little things that say “utm_source=Dan&utm_medium=WordOfMouth” after a ? in a link URL. These essentially hard code the source, medium, and other parameters that you specify into Google Analytics, overruling whatever the software thinks the referring information might be. In this way, visits that might otherwise have been swallowed up by the ‘Direct’ monster are instead attributed to the source and medium of your choosing by the campaign parameters. By applying campaign parameters to URLs wherever possible (a LunaMetrics best practice), you can help assure that visitors will be correctly attributed in your Analytics. Google’s URL Builder is as good a tool as any to accomplish this.

This is an imperfect solution. For example, if a visitor came through on a URL tagged ‘utm_source=twitter’, then copied & pasted the URL from the toolbar into Facebook and preserved those campaign parameters, all of those Facebook visitors would be erroneously classified as visits from Twitter; likewise for conversions. This is not a total loss since Twitter is the original ‘Assisted Referrer’ and deserves some credit, but in terms of actionable data, it’s more valuable to know that those visitors came from Facebook.

For a long time, this has been the status quo. Tag URLs where you can, attribute as best you can, and analyze the rest. I believed there had to be a better way.

The ‘Ah-hah!’ moment came when considering the wider picture of a direct traffic interaction. When we think about it, Direct traffic boils down to one thing – Visitor A sends a link to Visitor B through a platform that doesn’t pass a referrer to Google Analytics. Sure, we don’t have information on Visitor B; but what about Visitor A, the person who copies and shares the link? They come to the site, copy the URL from the toolbar and pass it along. The idea was simple – what if we dynamically append every visitor's referral information as campaign parameters to their URL, so that if that visitor copied and shared the link, we’d be able to attribute the visitors it drove to a source and medium other than ‘Direct / None’. And thus, directmonster.js was born.

How it works

The way that DirectMonster.js works is pretty simple – when a visitor lands on the site, it gets their referrer information, ciphers it, and stores it behind a ‘#’ in their URL, like below. This information is dormant and doesn’t actually cause Analytics to do anything, at this point.

hashed-parameters

Although it looks like gibberish, those parameters are actually just the visitors referral source and medium, shifted one letter forward. (u=t, f=e, t=s, sr=test). Built into the plug-in is a list of common referring parameters, like ‘organic’, ‘google’, and everyone’s favorite, ‘(not provided)’, which get swapped with single-letter replacements to cut down on URL length. The last parameter, ts=, is either the timestamp from the visitors UTMA cookie (Asynchronous version), or the users CID (Universal version).

Now, when a visitor hits our site, copies the URL, and emails it to someone, they pass along those latent parameters. If someone visits through the link they share and they don’t have referrer information, DirectMonster compares their unique identifier with the one saved in the link URL, and, if they are different, it decodes the store parameters into campaign parameters, which Google Analytics then stores as the users referring source, medium, and so forth.

decoded-params

Most importantly, ‘-slb’ is appended at the end of whatever ‘utm_content=’ value a visitor has – this way, we can segment out those visits later in Analytics and differentiate between ‘assisted’ referrals and normal referrals. Depending on the needs of an analyst, this implementation can be customized to suit their data needs.

From here, things get a little complicated – we’ve got two different versions, one for the asynchronous code (v1.X.X) and one for Universal Analytics (v2.X.X). They both do basically the same thing: better attribution for direct traffic and conversions inside Google Analytics. There a few key differences between the two, however; 1.X.X (async) can be added to a site simply by hosting the script and including it above the Google Analytics tracking code. 2.X.X (Universal) takes advantage of some advanced features, which means there has to be a few modifications made to the Universal Analytics tracking code; however, I think the benefits are definitely worth the extra effort.

Universal & DirectMonster – Introducing Cross-User Analytics

The Universal version of DirectMonster takes advantage of the client ID (CID) generated by Google Analytics to track not only the assisted referral information but also the assisting referrer. The CID is a unique value generated by Analytics that it stores inside the _ga cookie. DirectMonster 2.X.X uses this value instead of a timestamp to determine when to decode parameters it has stored. If the value saved in the URL that Visitor B visits is different than the value in the Visitor B’s cookie, and Visitor B has no referral information, it will decode the stored values as campaign parameters. It also passes those two values (Visitor A’s CID & Visitor B’s CID) as custom dimensions: the Assisted Referrer and the CID, respectively.

What we can then do with that data is answer questions like:

1.) How many visits did a 3rd party’s tweet drive to my site, and how much revenue did those visits generate?

visits-from-tweet

all-visits-referrerd

2.) How do people who end up linking to my site generally interact with it? How often do they visit? Can we optimize for visitors who we think might link to us?

3.) What happens when I attribute single-visit ‘Direct / None’ conversions to the visit history of the assisting referrer?

4.) What products are people more like to share with their friends after they purchase, and how do those shares drive conversions?

5.) How many users, on average, touch a conversion? What about the median?

As far as I know, this kind of attribution hasn’t been done before; I’m pretty excited by it. Google Analytics Asynchronous gave us cross-channel attribution. Universal Analytics gives us cross-device attribution. Universal Analytics and DirectMonster unlock, for the first time, cross-user attribution.

Implementation

Now for the nitty-gritty. Implementing DirectMonster 1.X.X is simple – visit the DirectMonster Github, download the 1.X.X script, host it on your domain, and simply include it in your header above the Google Analytics tracking code. That’s it; it will start doing its magic right out of the box. Again, I have to stress, you must place it above the GATC in order for it to work properly.

DirectMonster 2.X.X, for Universal, is a little more complicated to implement. In order to utilize it’s CID capturing functionality, you’ll need to create the requisite Custom Dimensions inside the Analytics interface, and you’ll need to customize your Universal GATC a little in order to get the script to work properly. You’ll also need to define all the domains you do not want to appear as source/mediums as a RegEx defined for the ‘ownedHostNames’ variable. For example, the LunaMetrics code simply has /www.lunametrics.com/ defined, since it is the only domain we’re tracking with our GATC. For complete instructions, please reference the readme.txt inside the 2.X.X folder on the DirectMonster GitHub.

Going Forward

For us, the results have been incredible.

direct-conversions

We’ve dropped the percentage of our conversions showing up last touch from 45% to just over 20% of our conversions, and with Universal, we’ll soon be able to stitch together a complete picture of all of those conversions. Better still, our overall site traffic has become more accurately attributed – we get a real picture of the actual value that social networks, email, and other one-to-one sharing environments drive to our website. It really is remarkable, and it only scratches the surface of what this kind of data can do.

We’ve decided to make DirectMonster, in its current state, available as open source code on GitHub. We’re excited to see what the community does with it, and I hope to soon see it out in the wild.