Visitor Tracking with Redis and PHP

A recent project had the requirement for detailed and instant visitor tracking integrated into the admin gui of a high traffic site. There are plenty of enterprise visitor tracking suites (Omniture, Google Analytics, etc.), but none that I could find that provide near-realtime, instant data and the level of customization required for the project.

Visitor tracking is a tricky beast because a site's highest traffic minute can have hundreds or thousands of times more visitors than its lowest traffic minute. The way traffic grows virally via Facebook and Twitter has only compounded this problem. A suitable database must be able to take many more inserts than selects in a short period of time without crashing. Redis is an ultra fast key-value store, and it collects inserts in memory and then writes them to disk periodically. By doing so, it can handle many, many more inserts than a traditional database that writes directly to disk. The trade off is that at any time, newly stored data can still be in memory and not available for retrieval. This was an acceptable drawback and met the near-realtime data requirement of the project. Below is a very simplified implementation based on the one used on this project.

Redis

Redis is the backbone of this tracking implementation. It is fast and reliable, and can handle a massive amount of key sets. Redis is easily installed and configured on a Linux system and is relatively light on memory. The downside of Redis is that it is not a relational database - you have to engineer the data patterns you store to allow you to access the information you want.

Redis PHP Client (Predis)

To bridge the gap from PHP to Redis, I used the excellent Predis library available. Because I'm on CentOS and running PHP 5.2, I had to use the 5.2 backport version of Predis on GitHub. The below code is dependent on this library.

The Tracking Pixel

My on page implementation goal was to emulate the old DoubleClick style 1x1 tracking pixel. In this method, pages have a 1x1 pixel image included in the footer of the page. When the image loads, it's actually loading a server-side script that increments the database (Redis) and then returns image headers so that your browser is none the wiser. All tracking data is reported during this

Tracking Script

The below script is dependent on having Predis available at the included location and having an actual 1x1.gif pixel in the same directory. You can download this pixel here.

File: track.php

<?php

require_once 'predis/lib/Predis.php';

//include domain here starting with "."
$cookie_domain = '.ebrueggeman.com';

//unique visitor timeframe in seconds
$unique_timeframe = 1800;

//track a unique visitor
$unique = FALSE;

//set initial visitor cookie
if (!isset($_COOKIE['vst'])) {
        setcookie("vst", 1, time() + $unique_timeframe, "/", $cookie_domain);
        $unique = TRUE;
}

//set time variables
$day = date('Y-m-d');
$day_hour = date('Y-m-d:G');

//get handle to Predis client
try {
        $redis = new Predis_Client();
} catch (Exception $e) {
        error_log('redis client not initialized:' . $e);
        die;
}

//track pageviews
$redis->incr('pageviews-by-day:' . $day);

if ($unique) {
        //track unique visitors
        $redis->incr('uniques-by-day:' . $day);
}

//output appropiate headers and serve pixel
$pixel = '1x1.gif';
header('Content-Length: ' . filesize($pixel));
header('Content-Type: image/gif');
header("Cache-Control: no-cache, must-revalidate");
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT"); // Date in the past
print file_get_contents($pixel);
?>

To actually see results from the above tracking script, you'll need a results script. The following script displays the pageviews and uniques for today and yesterday.

File: results.php

<?php
require_once 'predis/lib/Predis.php';

$today = date('Y-m-d');
$yesterday = date('Y-m-d', strtotime('yesterday'));

//get handle to Predis client
try {
        $redis = new Predis_Client();
} catch (Exception $e) {
        error_log('redis client not initialized:' . $e);
        die;
}

$pageviews_today = $redis->get('pageviews-by-day:' . $today);
$pageviews_yesterday = $redis->get('pageviews-by-day:' . $yesterday);

$uniques_today =  $redis->get('uniques-by-day:' . $today);
$uniques_yesterday = $redis->get('uniques-by-day:' . $yesterday);

echo "Pageviews Today: $pageviews_today <br>";
echo "Pageviews Yesterday: $pageviews_yesterday <br>";
echo "Uniques Today: $uniques_today <br>";
echo "Uniques Yesterday: $uniques_yesterday <br>";

?>

To test all this, you should be able to hit track.php in your browser and then load results.php and watch the counts increase.

Site Integration

The cleanest way of integrating this tracking call into your site is by injecting the pixel onto the page in JavaScript. This is the JS snippet I used, based on the Google Analytics async tracking code. Doing this asynchronously prevents page slow down if the tracking pixel cannot be loaded for some reason and also prevents bots from getting counted as a valid visitor

<script type="text/javascript"> 
(function() {
    var pxl = document.createElement('img');     
    pxl.async = true;
    pxl.src = '/track.php';
    pxl.width = 1;
    pxl.height = 1;
    var s = document.getElementsByTagName('script')[0]; 
    s.parentNode.insertBefore(pxl, s);
})();
</script>

Now on page load your tracking script is called, successfully incrementing the unique and pageview counts.

Advanced Tracking

Pageviews and uniques are not particularly compelling information these days. Luckily, you can get much more advanced with your Redis implementation. Redis doesn't support select queries to gather information. Instead, you can use a structured key/value convention that allows you to intelligently retrieve information.

Consider the following code snippet. We could execute this after the above track.php code. The first step is to increment a global visitor id value in Redis that we'll be using to attach all additional tracking values to. Next, we can set keys using this visitor id concatenated between "uid" and the name of the value we want to track. The values below ($client_ip_address, $client_user_agent, $request_page, $request_time) can be set prior to this call using values in PHP's $_SERVER array (not shown). Next, we'll push this visitor id onto a list with the key of 'visits:' and the current day.


//create a visitor id
$visitor_id = $redis->incr('globals:vistor');

//update data with visitor id
$redis->set('uid:'. $visitor_id . ':ip' , $client_ip_address);
$redis->set('uid:'. $visitor_id . ':agent' , $client_user_agent);
$redis->set('uid:'. $visitor_id . ':page' , $request_page);
$redis->set('uid:'. $visitor_id . ':time' , $request_time);

//push visitor onto today's stack
$redis->lpush('visits:' . $day, $visitor_id);

So what can you do with the above advanced tracking code? You can use lrange to return an array of all visitor id values for a specific day with the following code.

$visitor_array = $redis->lrange('visits:' . $day, 0, -1);

Now that you have all the visitor ids, you could loop through each visitor id and use it to retrieve the client values we set earlier. From here you have your original user values and you can do anything.

 foreach ($visitor_array as $visitor_id) {
     $user_ip_address = $redis->get('uid:' . $visitor_id . ':ip');
     echo "IP address: " . $user_ip_address;
}

As mentioned before Redis doesn't allow traditional select statements to query data. Redis' strength is its lightning fast insert performance and scalability. A robust tracking implementation could insert data into Redis on page load and then sync data into MYSQL at various intervals, allowing better querying and access to data.