404 Error Handling

I’ve noticed on another site I run (that has some extensive user tracking) that users are ending up quite often on 404 pages. They’re not telling me what link on my site is bringing them there - so I started looking for ways to get that information (without the need to bug the particular user(s) that are getting the 404)

(so far I have just one link - if this does the trick then I won’t add more - otherwise I’ll add to this post as I find more)

Here’s a 404 Handler I found from webmasterengine.com (This script emails you the details of the 404)
(got this to work after I fixed a problem with the way my .htaccess was written - see comments of this post for details)

6 Responses to “404 Error Handling”

  1. Yoshi Says:

    I use a similar approach. I used to also have it record to a db as well, but I dropped that. now I settle for just an email, with a referer link (so i know if it’s from a search engine, etc), and the ip address of the visitor (so i can determine if it’s a bot). I could send you my code if you’d like.

    In the .htaccess, it’s exactly like they say if you’re running apache:

    ErrorDocument 404 /404.php

    (or whatever you name your 404 file)

  2. Paul Roub Says:

    Of course, you can extract a lot of this from your logs if you have referrer logging turned on. However, for ease of reporting and juggling, on one of my higher-traffic sites I keep a simple not-found table:

    notfound_id, int, primary key, auto-increment
    notfound_date, timestamp
    notfound_url, varchar(255)
    notfound_referer, varchar(255)
    notfound_ip, varchar(25)
    notfound_useragent, varchar(255)

    And my 404.php page writes to it as follows:

    header(”Status: 404 Not Found”);

    $url = $_SERVER["REQUEST_URI"];
    $referer = $_SERVER["HTTP_REFERER"];
    $clientip = $_SERVER["REMOTE_ADDR"];
    $useragent = $_SERVER["HTTP_USER_AGENT"];

    $sql = ‘insert into notfound (notfound_url, notfound_referer, notfound_ip, notfound_useragent) values ( ‘ .
    “‘” . mysql_escape_string($url) . “‘, ” .
    “‘” . mysql_escape_string($referer) . “‘, ” .
    “‘” . mysql_escape_string($clientip) . “‘, ” .
    “‘” . mysql_escape_string($useragent) . “‘)”;

    // assuming an existing mysql connection
    mysql_query($sql, $dbconn);

  3. Jennifer Says:

    You see - this is where I think something might be screwed up in the way I’m handling 404 errors to begin with. For me - on my 404 page - $_SERVER["REQUEST_URI"] will BE the 404 page. Like it “forgets” what was being asked for - and only remembers the current page - which IS the 404 page…

    But as far as I know - I have the 404 page pointed correctly in my .htaccess… I must be overlooking something!

  4. Jennifer Says:

    A HA!!
    The problem was that I had the .htaccess line like this:

    ErrorDocument 404 http://www.mysite.com/404.php

    instead of

    ErrorDocument 404 /404.php

    Strange. By putting the “full web path” in there - it loses what was “being asked for”… Now maybe I”ll try that email script again…

  5. Yoshi Says:

    Wow, I’ve never encountered that, but I would wager to say it’s just how your web server is setup. This is good to know!

  6. Jennifer Says:

    WOAH! Well… the good news is that it works… the bad news is that the 404 page gets a lot more traffic than I realized. LOL!! I think I have to make my own script that keeps a list of “missing pages” - but only one entry per missing page (ie. no duplicates). Perhaps finally after a long hiatus, this is a chance for me to post a home-grown script… should be simple enough.