scriptygoddess

15 Apr, 2003

404 Search Page…

Posted by: Jennifer In: Bookmarks

Erik created a wonderful 404 search function. It requires only PHP.

This explains what it does.

I think this one would be a useful feature to add to scripty…

[Thanks Etan for the tip!]

Update 4/25 – as you can see in the comments section here, there's some debate about this script. While I'm still leaving this post up here (because I think this script COULD be made to work better), but based on this story, I do not recommend installing it until then (or unless you're really sure you know what you're doing).

19 Responses to "404 Search Page…"

1 | Amy

April 16th, 2003 at 1:46 am

Avatar

I'm not an MT user, so you guys will have to verify this on your own. Unless MT has some kind of string-escaping done before the db is queried, that bit of PHP linked to in this entry comes built-in with its own yummy potential security hole.

Never, never, never, NEVER allow users to query your database without making sure that the user-submitted data is mysql-safe. Even a simple

$full_search_url = $search_url . mysql_escape_string($search_term);

would make me stop growling in this instance.

Ok, rant over. I will now return you to your normal kind, gentle domesticat.

2 | Amy

April 16th, 2003 at 1:58 am

Avatar

Ok. Adam (gessaman.com) confirms that the mt-search escapes strings, which is comforting. The other thing that bothers me about this – I realize I'm playing devil's advocate here – but doing it this way makes it almost too easy to bring a MT-powered website to a screeching halt.

Think about it – any string appended to the URL is going to trigger a database query. That's amazingly easy to write a bot to exploit – real url + garbage-string requests rapid-fired, over and over….and every one of them will cause a db lookup.

On sites that are prepared to handle this kind of traffic, like php.net, such things won't be as big of a deal, but such things on a personal site hosted on an ISP machine with shared bandwidth…that could get your site shut down pretty quickly should someone choose to hammer your site.

At the very least, perhaps a sleep() delay should be added.

I think the idea's got potential, but I don't think it's ready for prime-time yet.

3 | Etan

April 16th, 2003 at 9:28 am

Avatar

If someone wanted to DOS your site with mt-search.cgi, they could just do so anyway, using mt-search.cgi.

4 | Amy

April 16th, 2003 at 11:42 am

Avatar

Oh, I don't doubt that they could. I'm just mystified as to why you'd want to make it easier for them.

5 | Etan

April 16th, 2003 at 12:32 pm

Avatar

"I'm just mystified as to why you'd want to make it easier for them."

I don't really have a fear that anyone will want to DOS my weblog. If they really wanted to, they could (as I said) through smurf attacks or mt-search.cgi.

I think that this provides more utility to people who type in bad URLs or need a simple search feature, than danger of attack.

6 | Jennifer

April 16th, 2003 at 12:40 pm

Avatar

I'm going to have to side with Amy on this. If you're not worried about DoS attacks – then you're probably all clear to add as is… but I think I'll implement this *here* if/when the security issue is cleaned up.

7 | Michael Hanscom

April 17th, 2003 at 1:04 am

Avatar

Count me in as not to worried about DOS attacks — just implemented it on my site, and it works beautifully!

Best part is, for me at least, it's not an MT-specific trick. I use a seperate search engine for my site, since it encompasses more than just by blog, and all it took was putting in the appropriate search URL and making sure the string to grab the search result links worked for my setup.

Great little script!

8 | dave

April 18th, 2003 at 7:16 am

Avatar

Just another little thing to add, but a string/character limiter line may be good to throw in here. Excessively long URLs are famous attacks for buffer overflows, and in this case you can attack Apache, PHP or MySql doing so. I would trim search strings to some reasonable amount using substr(), say 256 chars in length. You could even include a line if a user exceeded the limit.

9 | Jennifer

April 25th, 2003 at 10:03 am

Avatar

This is in the trackback pings, but I think I need to update this post based on the problem he encountered… in the off chance people implement this without reading the "warnings". I think this IS a great work in progress and it's great idea… but it still needs some work.

10 | Ian

April 25th, 2003 at 10:14 am

Avatar

Firstly: yep – I should have read the comments before trying it. Consider my wrist duly slapped 😉

I think the best way to avoid similar problems would be to do some simple browser checking so the page didn't execute a search if the visitor was known to be a bot – either that or some kind of rate limiting, which might alleviate some of the problems discussed earlier too.

The "search from 404" concept is a good idea – it just needs a little more work to avoid some of the pitfalls.

11 | Ian

April 26th, 2003 at 8:03 am

Avatar

Just been chatting to Brian about this – he's suggested adding a check to see what the server load average is before carrying out the search, and if it is over a present value simply return a standard 404 page. I might try making this modification when I get a second, but I don't think there is an easy way to make it cross-platform.

12 | Clint

May 20th, 2003 at 10:36 pm

Avatar

You *are not* making it easier. If someone is going to DDoS your site they can do it *just as easy using mt-search.cgi*. Neither way is "easier" or "harder", if someone is going to do something, they'll do it. If you're worried about the mt-search.cgi then remove it, it is the source of your problem :)

13 | Rob

May 22nd, 2003 at 6:13 am

Avatar

To help the googlebot problem I set my script so that you can search using a directory in my weblog (such as /blog/searchterm) and not just /searchterm. If you just go to /searchterm it redirects to the standard 404 page. Im sure its possible to get length of the request_uri and see if its longer than say, 20 characters, and if so just send them automatically to the 404.

14 | Kristian

May 25th, 2003 at 7:06 pm

Avatar

A thought: Perhaps a check before it executes any other code (like those big database queries) it checks the useragent string to see if GoogleBot is listed in it and simply dump out a standard 404 response? Simple regex, right?

15 | tarun

July 17th, 2003 at 10:41 am

Avatar

Why not just have it check if it is a robot searching the site and spit out a standard page versus a user searching a site, or just limit number of queries per unit time…

16 | The Long Letter

April 17th, 2003 at 1:30 am

Avatar

No more 404's
I just implemented a very nice little PHP script for my website that ties into my site search function — the end result being that my site no longer has a "404 File Not Found" error page!

17 | Ian Gregory Online

April 25th, 2003 at 9:44 am

Avatar

Seemed like a good idea at the time
ScriptyGoddess posted a link to tip from NSLog(); suggesting a cool way of automatically searching from 404 pages, which I implemented a couple of days ago. Unfortunately I didn't take Googlebot into account, and when it visited today it generated

18 | Ian Gregory Online

April 25th, 2003 at 9:44 am

Avatar

Seemed like a good idea at the time
ScriptyGoddess posted a link to tip from NSLog(); suggesting a cool way of automatically searching from 404 pages, which I implemented a couple of days ago. Unfortunately I didn't take Googlebot into account, and when it visited today it generated

19 | Exordium

May 21st, 2003 at 1:28 am

Avatar

404 Error Search Redux
I noticed an entry over on Antipixel about Erik's code for redirecting "404 File Not Found" errors to a search page. Well, I looked at the code and decided that I wanted to play around with it to make it

Featured Sponsors

Genesis Framework for WordPress

Advertise Here


  • Scott: Just moved changed the site URL as WP's installed in a subfolder. Cookie clearance worked for me. Thanks!
  • Stephen Lareau: Hi great blog thanks. Just thought I would add that it helps to put target = like this:1-800-555-1212 and
  • Cord Blomquist: Jennifer, you may want to check out tp2wp.com, a new service my company just launched that converts TypePad and Movable Type export files into WordPre

About


Advertisements