404 Error Handling
I’ve noticed on another site I run (that has some extensive user tracking) that users are ending up quite often on 404 pages. They’re not telling me what link on my site is bringing them there - so I started looking for ways to get that information (without the need to bug the particular user(s) that are getting the 404)
(so far I have just one link - if this does the trick then I won’t add more - otherwise I’ll add to this post as I find more)
Here’s a 404 Handler I found from webmasterengine.com (This script emails you the details of the 404)
(got this to work after I fixed a problem with the way my .htaccess was written - see comments of this post for details)
March 18th, 2004 at 11:57 am
I use a similar approach. I used to also have it record to a db as well, but I dropped that. now I settle for just an email, with a referer link (so i know if it’s from a search engine, etc), and the ip address of the visitor (so i can determine if it’s a bot). I could send you my code if you’d like.
In the .htaccess, it’s exactly like they say if you’re running apache:
ErrorDocument 404 /404.php
(or whatever you name your 404 file)
March 18th, 2004 at 12:32 pm
Of course, you can extract a lot of this from your logs if you have referrer logging turned on. However, for ease of reporting and juggling, on one of my higher-traffic sites I keep a simple not-found table:
notfound_id, int, primary key, auto-increment
notfound_date, timestamp
notfound_url, varchar(255)
notfound_referer, varchar(255)
notfound_ip, varchar(25)
notfound_useragent, varchar(255)
And my 404.php page writes to it as follows:
header(”Status: 404 Not Found”);
$url = $_SERVER["REQUEST_URI"];
$referer = $_SERVER["HTTP_REFERER"];
$clientip = $_SERVER["REMOTE_ADDR"];
$useragent = $_SERVER["HTTP_USER_AGENT"];
$sql = ‘insert into notfound (notfound_url, notfound_referer, notfound_ip, notfound_useragent) values ( ‘ .
“‘” . mysql_escape_string($url) . “‘, ” .
“‘” . mysql_escape_string($referer) . “‘, ” .
“‘” . mysql_escape_string($clientip) . “‘, ” .
“‘” . mysql_escape_string($useragent) . “‘)”;
// assuming an existing mysql connection
mysql_query($sql, $dbconn);
March 18th, 2004 at 2:08 pm
You see - this is where I think something might be screwed up in the way I’m handling 404 errors to begin with. For me - on my 404 page - $_SERVER["REQUEST_URI"] will BE the 404 page. Like it “forgets” what was being asked for - and only remembers the current page - which IS the 404 page…
But as far as I know - I have the 404 page pointed correctly in my .htaccess… I must be overlooking something!
March 18th, 2004 at 2:16 pm
A HA!!
The problem was that I had the .htaccess line like this:
ErrorDocument 404 http://www.mysite.com/404.php
instead of
ErrorDocument 404 /404.php
Strange. By putting the “full web path” in there - it loses what was “being asked for”… Now maybe I”ll try that email script again…
March 18th, 2004 at 2:55 pm
Wow, I’ve never encountered that, but I would wager to say it’s just how your web server is setup. This is good to know!
March 18th, 2004 at 3:08 pm
WOAH! Well… the good news is that it works… the bad news is that the 404 page gets a lot more traffic than I realized. LOL!! I think I have to make my own script that keeps a list of “missing pages” - but only one entry per missing page (ie. no duplicates). Perhaps finally after a long hiatus, this is a chance for me to post a home-grown script… should be simple enough.