PHP Site Access Log

The Basics

The question is, why do you need a site access log? The answer is easy - to see who visits your site. If you are a site owner and you have a hosted web site, you probably have seen web stats of some kind compiled by your hosting company. How do they do it? Often, they do it by analyzing your site's access log.

A site access log is a record of all access to your site. Basically it is a text file and each line is a record of a visitor's viewing of a particular page. Typical site access logs include the name of the page accessed, a user's IP address, the time of viewing, and some other useful information.

By looking at a site access log, you can see how a visitor moves through your site. You can see how many seconds it takes for a visitor to go from one page to another. You can see which pages are navigated to and which ones are avoided.

The Script

This handy site access log PHP code will create one text file for each day of logging in a folder called logs. Note: you have to make a "logs" folder in your base directory for this to work. The directory is not created automatically. Place this code in any page you want to be recorded in the log file.

<?php

//ASSIGN VARIABLES TO USER INFO
$time = date("M j G:i:s Y"); 
$ip = getenv('REMOTE_ADDR');
$userAgent = getenv('HTTP_USER_AGENT');
$referrer = getenv('HTTP_REFERER');
$query = getenv('QUERY_STRING');

//COMBINE VARS INTO OUR LOG ENTRY
$msg = "IP: " . $ip . " TIME: " . $time . " REFERRER: " . $referrer . " SEARCHSTRING: " . $query . " USERAGENT: " . $userAgent;

//CALL OUR LOG FUNCTION
writeToLogFile($msg);

function writeToLogFile($msg) {
     $today = date("Y_m_d"); 
     $logfile = $today."_log.txt"; 
     $dir = 'logs';
     $saveLocation=$dir . '/' . $logfile;
     if  (!$handle = @fopen($saveLocation, "a")) {
          exit;
     }
     else {
          if (@fwrite($handle,"$msg\r\n") === FALSE) {
               exit;
          }
  
          @fclose($handle);
     }
}

?>

Script Notes

You can setup your $msg variable anyway you want to, but careful thought should be done as to how you are going to parse it later. Inserting a common delimiter, such as a symbol between elements of the log entry can make it easier to parse later. For example, examine the following $msg assignment:

$msg = "IP: " . $ip . " ##TIME: " . $time . " ##REFERRER: " . $referrer . " ##SEARCHSTRING: " . $query . " ##USERAGENT: " . $userAgent; 

If we had done this, we could use PHP's explode() or split() functions and separate the elements easily, by splitting the line by the character ##. Why did I use two # and not one? Both the referrer and the query variables have the possibility of having the # character in them because this is used when defining an html anchor. Having two ## is still possible though, however unlikely, so it would be a better idea to split them with a random string that you'll probably never see naturally, such as *%^%*.

You could also make your log files in the common logfile format, which is supported by all major servers. A benefit of using this is that there are downloadable scripts and programs that can analyze your logs for you in you use this format. The definition of the format is on the W3C site here. I prefer to to write my own log analysis scripts however and prefer to not use the common log format.

You'll also notice that we have put the @ symbol before the call to the fwrite(), fopen(), and fclose() functions. This is the error suppressor symbol, and it will prevent PHP from displaying any errors related to these function calls. Because logging is likely secondary to the main goal of our page, which is to provide the viewer with some sort of service or information, showing them a failed error message is unnecessary and unprofessional. Sure it works now, but if you delete the logs folder at some point in the future, then you will see a nasty error displayed on all pages containing this script. Instead of making all users see this hideous error message, just check occasionally to see if any new log files are being created. We also could have checked to see if the folder exists before calling those functions, but it is difficult to test every single possible point of error, and costly to do it every time a page is requested.

When creating the log file, we use the format: date("Y_m_d") . It is important to start your log file name with the year and not the month or day. If we started the name with the month, a year from now our logs would no longer be naturally sortable by name because the file system will put all the December (12) entries together and all the June (6) entries together. Instead, start the name of the file with the year, followed by the month, and then the date, to allow for easy natural date sorting by filename.

Now What

Now that you have a text file full of page access records you can do quite a few things. You can parse the info and gather the number of unique IP addresses that visited your site. You can calculate the average time per visitor, or number of page views per IP address. The details are beyond the scope of this article, but very easily doable with a little PHP tinkering.