Archive

SEO Database Website

Intended Audience

In today’s fast paced world, the importance of Search Engine Optimisation and Usability has been thrust into the limelight. More and more companies and employers are hearing the buzz that is SEO, and want you to ensure them that their new website will deliver. They all want to be top of the search engines and we all know that that will rarely happen. There are however things we can do to help the issue and to make life a little easier for those lovable bots and spiders who crawl our pages.

Introduction

Whether you believe the fact that search engines discriminate against id numbers in URL strings or not, the fact of the matter is that:

[1] http://www.a_mucky_page.com/products?id=00234987&category=990003429

will never look as nice to the end user or the search engines, as

[2] http://www.a_nice_place_to_be.com/products-modems/speed-baud-5000.html

Getting rid of GET parameters is also an advantage as some search tools will discriminate against pages based on the number of parameters.

This article is a primer to help overcome this problem using PHP / MySQL / Apache and a little pinch of mod_rewrite thrown in for some spice. In some cases this may not be the best solution, but it is always worth considering as mod_rewrite is an extremely powerful tool.

To begin with you will need to have a webserver running apache, with mod_rewrite installed. If you own the server, this is going to be an easy, but if not you may have trouble getting the administrators to install it. The reasons for this are simple, the administrators are not trying to annoy you, but one false move inside mod_rewrite code and the CPU load goes through the roof. This article assumes you already have some knowledge of L.A.M.P. systems (Linux, Apache, MySQL, PHP), and that you have mod_rewrite installed.

Structuring

Structuring your database is an important step in the process and must be carefully thought about, however it maybe that your database is well enough structured already in which case, well done. Keeping your data tidy not only helps you but will automatically make for a better SEO system. We are going to use the example of a single tier categorisation system. By that we mean that products are sorted into a single category which fits the best. We will have two tables, one for categories and one for products. Their structure will be as in the tables below. You can see that the product table is linked to the category table through the category_id field.

Category id title description  
Products id category_id title description price

Preliminary Coding

Let us write some PHP code to pull out the record information based on the URL line [1] above. We have the two parameters we need, product and category. The customer wants the category information listed on the page with the product. Code for this may look like the example below.

<?php
//Setup Database mysql_connect(127.0.0.1, blark_inc, my_password); 
mysql_select_db(blark_inc);  
//Get the GET parameters 
$category=$_GET['category']; 
$id=$_GET['id'];  
//Call in data for category and products 
$category  =  mysql_fetch_array(mysql_query('SELECT * FROM category WHERE id="'.$category.'"')); 
$product   =  mysql_fetch_array(mysql_query('SELECT * FROM products WHERE id="'.$id.'"'));  
//Output the information 
echo 'Category Name : '.$category['title'].'<<BR>>'; 
echo 'Category Description : '.$category['description'].'<<BR>>'; 
echo 'Product Name : '.$product['title'].'<<BR>>'; 
echo 'Product Description : '.$product['description'].'<<BR>>'; 
?>

So in our simple example, the id’s of both tables are called with the GET method and their relevant information pulled from the database and output to the end user. As stated previously though, this method uses a URL string that does not look pretty in the least, rather it’s hideous. You may argue that the category id does not need to be stated, and whilst this is true for a single tier categorisation system, other more complicated types of system, may well need this capability. Just imagine if a product were to appear in more than one category.

Enter mod_rewrite

It is now that we pick up our pot of spice, with mod_rewrite scrawled in friendly letters on the site and begin add in that little bit extra. Up until now we have not really discussed what mod_rewrite does. We have simply put it forward as an answer to all our problems, so let us delve a little deeper. Mod_rewrite is an apache module that will rewrite urls according to certain rules. In effect a browser will request a file by name eg. my-life.html, and may be returned a completely different file, eg. my-friends-life.php, but it all happens transparently, i.e. the user will still think they are viewing my-life.html. Let us look at a few examples of how this could be used. Please note this is not actual code, just basic examples.

Change all .html to .php 
blark.html would become blark.php 
Would give the effect of a static site, all links would be .html.  
Redirect requests pages under maintenance index.php would become maintain.php 
Could modify a specific filename to point to a maintenance page.  
Use the page name as a GET parameter for another page 
my-information.php would become pages.php?page=my-information 
Useful for having one script to run the show.

Note

mod_rewrite will only modify the URL if Apache is used to collect the file. It makes no difference to files stored on your webserver. For example, includes in PHP will remain unaffected as they do not obtain the file through the HTTP protocol but use the local file system.

It is the last one of these examples which is of interest to us. Let us make a simple .htaccess file to test out some of the examples we have just written. The .htaccess file contains all the rules for mod_rewrite within that directory. An examples of which are below.

[[RewriteEngine]] on  
#First example - modify all .html to .php 
[[RewriteRule]] ^(.*).html$ $1.php  
#Second example - modify index.php to maintain.php 
[[RewriteRule]] ^index.php$ maintain.php  
#Third example - Use pagename as a get parameter 
[[RewriteRule]] ^(.*).php$ pages.php?page=$1

Note

If you make a mistake in the .htaccess file and the resulting code mod_rewrite finds invalid, you will just be alerted with an Internal Server error page, Error 500. Do not worry, this is normal. Just alter the line and try again.

The rules follow the format of,

[[RewriteRule]] 
What_I_am_looking_for What_I_want_it_to_become

Mod_rewrite Xplained !

The ‘^’ means start matching the filename from the beginning of the url string after the host. For example with, http://www.my-life.com/test.html, mod_rewrite will only look at what comes after http://www.my-life.com/, ie test.html. The ‘()’ are used to being catching data. Anything which is matched and falls inside these parentheses will be stored. This can then be recalled by using a ‘$1’ in the rewritten expression, as can be seen in the first example. the ‘.*’ within the brackets in the first example catches all characters, and the ‘$’ denotes the end of the URL string. In this example for a file to fit the criteria, it must be a set of characters, followed by ‘.html’ with nothing trailing on the end. When mod_rewrite finds a match, it takes the value inside the brackets and puts it back to work in the rewrite expression ‘$1.php’. $1 means use the data from the first set of brackets. If you had another set of brackets in the matching expression, then using the data from it would mean using ‘$2’ in the rewrite expression.

The second example should be obvious now. It is matches the term index.php exactly, no variations and rewrites it to maintain.php.

Note For full details on mod_rewrite head over to http://httpd.apache.org/docs/mod/mod_rewrite.html

Let’s Bake A Product Database (Bleugh)

We now have everything we need to make our search engine optimised product catalog. All that is needed is to remove those harmful ids and to replace them with something else. Why not make a separate field in the table which can hold a unique identifier of a product but written in text instead of numbers? It may sound like a tiresome task and unnecessary, but it can help you out. We will add a separate field to each table in our database and call it mod_name. This will hold a modified version of the product/category title, and this will be used as a unique identifier. For example the product ‘Speed Baud 5000 Enhanced modem’ may have a mod_name value of ‘speed-baud-5000’. It is up to you how you create the mod_name. It may be that you want to type each one in individually, or it may be that you use a simple php script to translate one into the other.

Mod_rewrite to the rescue

We now have to create a mod_rewrite rule that will interface with the code we wrote previously as closely as possible. Obviously now that we are using mod_names instead of id’s to call records there will have to be some changes, but the structure of it, should remain the same. Referring to the above code, it should be clear that we are expecting two parameters. One called ‘id’ and the other called ‘category’. Let us sculp a mod_rewrite expression that fulfils these criteria.

To make the URL user-friendly I have chosen the format of www.blark_inc.com/products-(category name)/(product_name).html

[[RewriteRule]] 
^products-(.*)/(.*).html$ 
products.php?category=$1&id=$2

This expression will take two pieces of data. The first is the word, or character string after the ‘products-‘ and the second is the name of the page in this phantom directory. Remember that the /products-whatever/ directory does not even exist, rather it is being used to fool the user and search engines into thinking that the site is structured in that manner.

To take our example from before,

http://www.blark_inc.com/products-modems/speed-baud-5000.html

will be magically and invisibly transformed into

http://www.blark_inc.com/products.php?category=modems&id=speed-baud-5000

See how easy it is!

We now need to make a few changes to the products page code in order for it to pull the records out of the database. All that needs to be changed are the field names in the database query lines.

$category = mysql_fetch_array(mysql_query('SELECT * FROM category WHERE id="'.$category.'"')); 
$product = mysql_fetch_array(mysql_query('SELECT * FROM products WHERE id="'.$id.'"'));

becomes

$category = mysql_fetch_array(mysql_query('SELECT * FROM category WHERE mod_name="'.$category.'"')); 
$product = mysql_fetch_array(mysql_query('SELECT * FROM products WHERE mod_name="'.$id.'"'));

and thus the total code becomes:

<?php 
//Setup Database mysql_connect(127.0.0.1, blark_inc, my_password); 
mysql_select_db(blark_inc);  
//Get the GET parameters $category=$_GET['category']; 
$id=$_GET['id'];  
//Call in data for category and products 
$category  =  mysql_fetch_array(mysql_query('SELECT * FROM category WHERE mod_page="'.$category.'"')); 
$product   =  mysql_fetch_array(mysql_query('SELECT * FROM products WHERE mod_page="'.$id.'"'))  
//Output the information echo 'Category Name : '.$category['title'].'<<BR>>'; 
echo 'Category Description : '.$category['description'].'<<BR>>'; 
echo 'Product Name : '.$product['title'].'<<BR>>'; 
echo 'Product Description : '.$product['description'].'<<BR>>'; 
?>

Summary

As mentioned previously this is a nice way to make your site look well structured to both user and search engine. It has been mentioned that mod_rewrite does take a little more CPU load to run as if there were no mod_rewrite at all, but I personally have never had a problem with it and providing it is used in the right way and not used to solve every mis-extensioned file, it should not be discounted and should form a part of your php toolkit.

About The Author

Pete has been programming since the age of 10 on an old Atari 800 XE. Though he took an Acoustical Engineering degree from the renowned ISVR in Southampton UK. The call of programming brought him back and he has been working as a web develoer/IT administrator ever since. He uses both Linux and Windows platforms. He still lives in the UK and is currently living happily with his wife. Comments, suggestions to debug@silentkeystroke.co.uk

Leave a Reply