Posted on March 30, 2008 in MySQL, PostgreSQL by Elliott BrueggemanNo Comments »

Overview

After adding PostgreSQL compatibility to my PHPSimpleChat library, I became interested in its performance versus MYSQL. In my workplace, a large New York based media corporation, MYSQL is quickly becoming the database of choice ahead of the costly Sybase. MYSQL certainly has the momentum of the industry behind it, but what about PostGreSQL? Having advanced features, a solid community following, and easy installation on all major operating systems, PostGreSQL shouldn’t be left out of the race.

Performance

The feature sets of MYSQL 5 and PostGreSQL 8 are roughly equal when using MYSQL’s InnoDB table type, with PostGreSQL having just a few more features. After meeting my needs feature-wise, my next biggest concern is performance. The sites I work with are often content management system (cms) based in popular platforms like Drupal and Wordpress. Complicated Drupal sites can execute several thousand queries for a logged in user for each page view and need a high-performance database to support it. Below, I’ve setup several tests using a single database connection to benchmark both MYSQL and PostGreSQL executing the same queries. In addition to testing MYSQL’s InnoDB table type, which uses row level locking, I also tested MYSQL’s MyISAM and Memory table types. MyISAM was the standard before InnoDB, but uses table level locking, which can cause backups in a highly active insert and update environments. Memory tables are stored entirely in memory, requiring no disk access for queries, but are truncated on every server restart.

Test 1: Select = int with integer index

integer select testSetup: 1000 row table setup identically in both MYSQL and PostGreSQL with an auto-incrementing primary key column (serial type column in PostGreSQL) with 1000 consecutive selects of random rows.

Results: Memory tables were the fastest type, though not by much compared to InnoDB’s other offerings. PostGreSQL’s selects lagged behind.

Test 2: Select = ’string’ with indexed varchar column

varchar select testSetup: 1000 row table setup identically in both MYSQL and PostGreSQL with an auto-incrementing primary key column (serial type column in PostGreSQL) and a varchar(32) (character varying in PostGreSQL) populated with random 2 character strings. The test does 1000 consecutive selects on the varchar column looking for random 2 character matches.

Results: Memory tables were the fastest type again, slightly ahead of InnoDB. MyISAM was noticeably slower in indexed string comparisons than it was with integer selects. PostGreSQL’s again lagged behind.

Test 3: Select like ‘%string%’ with indexed varchar column

varchar wildcard select testSetup: 1000 row table setup identically in both MYSQL and PostGreSQL with an auto-incrementing primary key column (serial type column in PostGreSQL) and a varchar(32) (character varying in PostGreSQL) populated with random 8 character strings. The test does 1000 consecutive selects on the varchar column looking for random 2 character matches within the 8 characters using the wildcard operator.

Results: Memory tables fastest. PostGreSQL surprisingly was the next fastest by a significant percentage over InnoDB and MyISAM, with InnoDB being the slowest by far.

Test 4: Inserts

database insert testSetup: I did the insert test for inserting both integers and varchars into a table with an additional auto-incrementing primary key column, but the results were so similar that I’m only going to show the results from the integer insert test. The test did 100 consecutive inserts into the table.

Results: Memory tables were predictably the fastest, followed by MyISAM, and then PostGreSQL. Its worth noting that the sequence of PostGreSQL inserts were five times slower than MyISAM and Memory tables. The InnoDB inserts are very slow to the other table types, but its worth noting that is because of the row-level locking. In a highly active read and write environment, InnoDB may actually perform better than the other table types. This is because it does not have to wait for the insert to finish, when a select comes into the queue. With table types that lock on the table level, select queries can get backed up waiting for each insert to complete.

Overall Results

Memory tables are the fastest overall, but are impractical in many situations where data-integrity is a concern. Applications like chat rooms, or quick lookup apps may be ideal for memory tables. MySQL’s default database in version 5 is InnoDB because of its blend of performance and feature set. MyISAM may be faster in some situations, but with multiple request threads in high-write environments, InnoDB is usually faster. PostGreSQL’s performance was surprisingly unimpressive with the sole exception of a varchar wildcard lookup. Unless the performance of PostGreSQL significantly increases with its next release, MYSQL will remain my open-source database of choice.

Posted on March 20, 2008 in PHP by Elliott BrueggemanNo Comments »

Again?

Awhile ago I posted a PHP Execution class article for timing various PHP elements. Since then, a few developers have requested that I include the ability to start and start the timer, so you can be more precise with elements that you want to time and elements that you don’t. I decided to code this class in full PHP 5+ object-oriented structure. In addition to increased functionality, sticking to this style of programming supports PHP as an enterprise level language, and helps its standing among other more weathered and commercially used languages.

The Timer Script

This script returns the time in ms (milliseconds) rounded to the nearest ms when used in a php script or web page.

<?php
class Timer{
 
  private $start_time;
  private $end_time;
  private $accumulated_time;
 
  //Public constructor
  public function __construct() {
    $this->start_time = NULL;
    $this->end_time = NULL;
    $this->accumulated_time = 0;
  }
 
  //Public functions
  public function start() {
    if ($this->is_stopped()) {
      //add time so far to accumulated time
      //reset end time and set start time
      $this->accumulated_time += ($this->end_time - $this->start_time); 
      $this->start_time = $this->get_timestamp();
      $this->end_time = NULL;
    }
    else if (!$this->is_started()) {
      $this->start_time = $this->get_timestamp();
    }
  }
 
  public function stop() {
    if ($this->is_started()) {
      $this->end_time = $this->get_timestamp();
    }
  }
 
  public function reset() {
    $this->__construct(); 
  }
 
  public function retrieve() {
    $this->stop();
    return round($this->accumulated_time + ($this->end_time - $this->start_time));
  } 
 
  //Private functions
  private function is_started() {
    //if start is numeric but end is null, we are started
     if(is_numeric($this->start_time) && is_null($this->end_time)) {
      return true;
    }
    return false;
  }
 
  private function is_stopped() {
    //if end time is numeric, we have a stopped timer
    if(is_numeric($this->end_time)) {
      return true;
    }
    return false;
  }
 
  private function get_timestamp() {	
    //retrieve seconds and microseconds (one millionth of a second)
	//multiply by 1000 to get milliseconds
    $timeofday = gettimeofday();
    return 1000*($timeofday['sec'] + ($timeofday['usec'] / 1000000));
  } 
}
?>

Usage

Using this script is relatively easy as you can probably see from the function names. The 4 main functions in addition to the constructor are start(), stop(), retrieve(), and reset(). Here is an example of usage of this class:

<?php
//instantiate the timer object
$timer = new Timer();
 
//start the timer
$timer->start();
 
//... do something worth timing
 
for($i=0;$i<10000000;$i++){}
 
//stop the timer
$timer->stop();
 
//... do something NOT worth timing
 
//start the timer again
$timer->start();
 
//... do something worth timing
 
//stop the timer
$timer->stop();
 
//print the time accumulated
echo $timer->retrieve();
Posted on March 9, 2008 in PHP by Elliott BrueggemanNo Comments »

MY Common PHP Includes

When I start a new application or website, I always start with the same framework of files. I have a series of include files that I include as necessary on pages (scripts) on the site. Stuff like site navigation, database connection, and core functions are some of the includes I use.

My functions include has all the functions that I’m confident are going to be used frequently enough on the site to warrant their inclusion (and subsequent declaration) in every script in the site. While most of the functions vary from site, because they relate to a specific feature or concept that the site incorporates, there are two functions that I always include.

My Get and Post Functions

All my sites and applications are dynamic, which means they receive input from the user using either the GET or POST method. If you incorporate forms into your site, you’re probably familiar with GET and POST and the differences between the two. One thing I noticed after some time programming was that handling data through GET and POST was done frequently and could be easily compartmentalized in two simple helper functions.

Below is my implementation of these functions:

<?php
function getData($var, $maxLength=99999) {
  $value = NULL;
  if (isset($_GET[$var])) {
    $value = trim(htmlentities($_GET[$var], ENT_QUOTES));
    if (strlen($value) > $maxLength) {
      $value = substr($value, 0, $maxLength);
    }
  }
  return $value;
}
 
function postData($var, $maxLength=99999) {
  $value = NULL;
  if (isset($_POST[$var])) {
    $value = trim(htmlentities($_POST[$var], ENT_QUOTES));
    if (strlen($value)>$maxLength) {
      $value = substr($value, 0, $maxLength);
    }
  }
  return $value;
}
?>

These functions act as helper functions to retrieve the value that was sent to the script and available to PHP’s global $_GET and $_POST arrays. They also implement a few simple extra features. When calling either of these two functions, you can also supply a $maxLength parameter, which would be the maximum expected length of that variable. If anything is longer, then someone may be trying to take advantage of your website by sending data not in the way you intended. If the variable is longer than expected, the function still allows it, though it trims it to the length you are expecting. This can be helpful if you are inserting values into a database that has limits on the number of characters in a field. In addition, the function also puts the value through PHP’s htmlentities() function which will encode each character into their html equivalent, which I almost always need. This can also disarm any injected code that a malicious user may be sending to your script. Lastly, the function returns NULL when the expected variable has not been passed. This allows easy error handling when calling this function. Here is an example of doing error handling when retrieving a function:

<?php
//Attempt to retrieve the value
$username = getData('username');
 
if ($username == false) {
  //This value does not exist, do something to handle this situation
}
else {
  echo 'Welcome ' . $username . '!';
}
?>