scriptygoddess

18 May, 2002

Google, Google, Go away

Posted by: Jennifer In: Scripts

Since my journal is of a personal nature – I don't really want it indexed on Google. I had previously posted about a way to "temporarily" remove your site from their index – unfortunately, it DOES time out! I've done this a number of times, and… well… it's BAAACKK.

If you can't beat 'em – pretend you don't exist, and maybe they'll go away.

I've now added code to my blog so that if you get to it from a google search you get a "file not found" page. For those of you wishing to hide from Google – here's how I did it:

At the top of (every) page (obviously – this is done as an include) – BEFORE any <html> tags I have this:

<?
$itsagoogle = 'google.';
$ref = getenv("HTTP_REFERER");
if (($ref) and (strstr($ref, $itsagoogle)) ) {
print('<head><title>File Not Found</title></head><body><H1>File Not Found</h1><p>The requested URL was not found on this server.</p></BODY>');
exit;
}
?>

That's it! Now if you do a search in google, and my site comes up – you'll get that generic "file not found page". I'm all for simple solutions!!!

(Standard disclaimer: This requires that you can run php on your page and server)

update: For those of you who used this code previous to 11:00pm on 5/18 please note I made a slight change so that ANY google referrer would be blocked (ie. the original script didn't work if they came from a www.google.ca search) but now it's fixed…

Update 3/16/03: Ron had a hack elsewhere that would work here if you'd like to add bad more rejected referrers in addition to Google. With his hack, here's how it goes: (this should be one of the first things on your page)

<?
function isBadReferrer($ref)
{
if (
(strstr($ref, "google.")) or
(strstr($ref, "aolsearch.aol.com")) or
(strstr($ref, "search.yahoo.com")) or
(strstr($ref, "search.msn.com")) or
(strstr($ref, "hotbot.com"))
//add more like the above line to add more "rejected" referrals
)
{
return true;
}
else
{
return false;
}
}
$ref = getenv("HTTP_REFERER");
if (($ref) and (isBadReferrer($ref) )) {
print('<head><title>File Not Found</title></head><body><H1>File Not Found</h1><p>The requested URL was not found on this server.</p></BODY>');
exit;
}
?>

46 Responses to "Google, Google, Go away"

1 | Gina

May 18th, 2002 at 4:26 pm

Avatar

Thank you Jenn! I've had this same problem and am so tired of resubmitting to have my site not listed. This is going to be great!

2 | Jennifer

May 18th, 2002 at 5:26 pm

Avatar

Just so you know – this is still only a "superficial fix", as anyone intent on getting to your site can just manually type in your url. hit enter. (Then hit refresh if neccessary) and they're on. But, of course, they'd have to KNOW to do that! LOL! 😉

3 | Row

May 18th, 2002 at 6:52 pm

Avatar

That's a good little trick! Especially if you have a blog listed on the Australian NineMSN blog directory. I don't know who wrote those descriptions, but they're most unflattering!

Hopefully now that I've changed domains and added meta info, I won't get spidered at all. *crosses fingers*

4 | Pete

May 18th, 2002 at 7:00 pm

Avatar

It should also be noted that a better solution (IMHO) is to never be indexed in the first place. How? Most (google included) index-bots follow the robots.txt standard.

Info on Google's bot can be found here: http://www.google.com/bot.html

Info on the robots.txt standard can be found here:
http://www.robotstxt.com

In theory, there are three or four pages on my site which should not be indexed by search engines, and after almost 1,000 search referrals, they've never been hit. One of them is the page of "interesting search referrals" so this does work for google, if done correctly. :)

5 | Jennifer

May 18th, 2002 at 7:16 pm

Avatar

Pete – I've done that… but I still get indexed :(

7 | Jade

May 18th, 2002 at 8:09 pm

Avatar

Google ignored my robots.txt, too, and I know of at least one other person who it ignored it for as well. Most annoying!

8 | Lynda

May 18th, 2002 at 8:22 pm

Avatar

This is a great little trick Jennifer!! It could also be used if some particular person or site was linking to you and you didn't want the traffic, you could shut them out and take up considerably less bandwidth. That doesn't happen all too often, but I'm sure it happens sometimes (I know it's happened to me before) so perhaps this would be a thought for that as well.

As far as google goes, I must be a lucky one. On my posh site, I just put a noindex, nofollow meta tag – I haven't been crawled by google on Posh for months now.

9 | jess

May 18th, 2002 at 8:53 pm

Avatar

this would work great for blocking that stupid "portal of evil" site or whatever it's called. wish i had known about this a while back.

i finally got google to stop indexing me by using this text in my robots.txt file:

User-agent: *
Disallow: /
User-agent: DittoSpyder
User-agent: Googlebot
Disallow: /

10 | Gina

May 20th, 2002 at 11:26 am

Avatar

I'm curious. Can you have TOO many parts of text in your .htaccess code? Will one override the other and etc. or possibly cause the blocking text to not work completely?

11 | Jennifer

May 20th, 2002 at 1:52 pm

Avatar

I have only a few lines in my .htaccess, and now only the text above in my robots.txt file… don't think that's the problem.

I once read somewhere that some of the "keyword to link" associations that Google does is based on how often throughout the net a particular (keyword) is used as the link to a page…

So in this case, my name: I leave a lot of comments on people's blogs – and on that page, my name is linked to my site. Therefore if you do a search for "Jennifer", Google knows that "Jennifer" is very often linked to "www.scriptygoddess.com"…

The article I read talked about how you could use that to play a prank on someone. Let's say your friend's site is http://www.joe.com. If on your page, every where you used the (keyword) "jerk" you linked to your friends site, and you asked a ton of friends to do the same – Google doesn't even have to spider http://www.joe.com... from it's spidering of OTHER pages, it draws the association of jerk and http://www.joe.com... so you do a search on the keyword "jerk" and up will pop your friends site…

Wish I could find that article…

12 | Jennifer

May 20th, 2002 at 1:55 pm

Avatar

…more proof that THAT is what's going on here… You'll notice any search that returns my site, the "cached" option is not available… it's because Google IS NOT spidering my site, but that doesn't fix my problem of not coming up on searches.

13 | Gaile

May 21st, 2002 at 3:20 am

Avatar

Interesting piece of script. I recently discovered that several people found me by typing my name into google and hitting the "I feel lucky" option. I don't mind being found, but that is a little *too* easy, since there are people that I really rather didn't know I have a weblog.

14 | Richard

May 22nd, 2002 at 1:53 am

Avatar

A question/suggestion: How about sniffing out the IP of Google's crawler (which I believe has 'googlebot' in it) instead of the referer? This would make sure that the Google cache gets a 'bogus' copy (for those nefarious types, like myself, who sometimes look at the cached copy of a site). Also, to mask that they used Google, they could simply copy the URL into the clipboard and paste it into a different browser (another favourite technique of mine when I want to mask what I'm searching for).

15 | robert

May 22nd, 2002 at 3:15 am

Avatar

Jennifer, that google trick you were talking about is called a google bomb and the article is here.

16 | Jennifer

May 22nd, 2002 at 5:45 am

Avatar

Robert – Yup that's the article!! Thanks for the link! 😀

Richard – re: crawler… see the comments above: My site isn't actually being crawled, and there isn't a cached version of my site available on Google. As for people copying and pasting the URL… if they're going to that extreme to see my site, then fine. I think most people will see the "page not found" and move along. I'm not trying to block everyone, just random people hitting my site, who aren't really interested in blogs in the first place.

17 | mark

May 22nd, 2002 at 11:15 am

Avatar

There's a nice little tutorial on using robots.txt at http://www.searchengineworld.com/robots/robots_tutorial.htm

18 | Selena

May 24th, 2002 at 10:40 pm

Avatar

Hi,
I was looking for MT hacks in google and found you site here
http://www.google.com/search?hl=en&lr=&ie=UTF8&oe=UTF8&q=moveable+type+hacks
Thought you might like to know..

Selena

19 | Jennifer

May 24th, 2002 at 10:44 pm

Avatar

Selena – that brings up scriptygoddess. That's okay. I was hiding another site. That's where I'm using the code.

20 | jess

June 27th, 2002 at 10:59 am

Avatar

jenn, i modified your code to look for the term "search" within the referrer link… this has enabled me to block out many other search engines, such as hotbot, msn, and altavista. :) thanks for the code!!!

<?
$itsasearch = 'search';
$ref = getenv("HTTP_REFERER");
if (($ref) and (strstr($ref, $itsasearch)) ) {
print('<head><title>File Not Found</title></head><body><H1>File Not Found</h1>The requested URL was not found on this server.<p></BODY>');
exit;
}
?>

21 | Gregory

July 26th, 2002 at 3:53 pm

Avatar

If the robots.txt wasn't working right for you, the other possiblity is to use the META tags for such things. Here's a site I found that has some good data on it: http://www.ceebanff.ca/help/tags/

22 | Elisa

November 4th, 2002 at 11:50 am

Avatar

I found a link directly off the Google's site on how to remove content from their indexes. There are various methods, but basically:

"If you want to prevent all robots from indexing individual pages on your site, then you can place the following meta tag element into the page's HTML code:

If you want to allow other robots to index individual pages on your site, preventing only Google's robots from indexing the pages, use the following tag:

More information on this standard meta tag element is available here: http://www.robotstxt.org/wc/exclusion.html#meta."

Word for word from that link – which by the way, I can't remember where I found on the Google site, but that I fortunately saved to my hardrive the last time. :)

24 | Jennifer

November 4th, 2002 at 2:01 pm

Avatar

Elisa – the only problem is that I was still getting listed in Google searches, even after doing everything they said on the page…

Read through the comments above for the explanation why…

25 | scott

January 13th, 2003 at 9:15 pm

Avatar

I modified Jennifer's script by adding additional conditionals to check for more than one search engine, thusly:

<?
$google = 'google.';
$altavista = 'altavista.';
$ref = getenv("HTTP_REFERER");
$goaway = '404 Not FoundFile Not
FoundThe requested URL was not found on this server.

';

if (($ref) and (strstr($ref, $google)) ) {
print($goaway);
exit;
} elseif (($ref) and (strstr($ref, $altavista)) ) {
print($goaway);
exit;
}
?>

This can be expanded infinitely, although Jess' substitution of "search" for "google" in Jennifer's original script may be the most effective method to combat the bots (aside from not getting indexed in the first place).

26 | eve

February 24th, 2003 at 7:26 pm

Avatar

I think i'm going to have to add this to my trackback pages because that's all google wants to index it seems.

27 | hmw

March 1st, 2003 at 1:25 pm

Avatar

Fantastic!!! I have removed myself from google so many times….I can't even bear the thought of doing it anymore! I have my robots.txt set up and meta tags to turn them away but no luck – this is amazing!

Now that I don't work in the clin lab anymore, I don't really give a crap if anyone can find my site but I can't deal with the pervs that hit on Brittany's site…..that's why that one is password protected now.

Thank you so much for this :-)

28 | Christine

March 5th, 2003 at 5:02 pm

Avatar

Thank you, thank you, thank you for this script!

29 | carol

March 6th, 2003 at 6:24 pm

Avatar

Can I use this tags with blogger -pro?

30 | Kim

March 17th, 2003 at 9:52 am

Avatar

A faster method (this code must be placed before all HTML code (top of the file)):

<?php

$blocked = Array("google", "search");
$ref = getenv("HTTP_REFERER");

if(in_array($ref,$blocked))
header("HTTP/1.0 404 Not Found");

?>

31 | Annessa

March 18th, 2003 at 12:37 pm

Avatar

This is WONDERFUL! It works great on my site. I do have a question, though, what would I need to do to get the site in question here (http://www.geekgrrl.com/archives/001891.php) to quit indexing me. Is there a way to do it using this script?

32 | Jennifer

March 18th, 2003 at 12:47 pm

Avatar

I think the only way to do that is using .htaccess: see here

33 | cal

March 19th, 2003 at 9:40 am

Avatar

this is great…i have put in the revised script. however, some of the code is showing up on my comment popup pages. if you go to my site and click on the comment link, you'll see what i mean. it's happening in comment preview as well. odd.

34 | Jennifer

March 19th, 2003 at 10:24 am

Avatar

You can not put this code on typical pop up comments because it they are a CGI page and therefore can not process PHP code. As far as I know – those pages are being generated dynamically – so I don't think Google or other search engines can index them – so it shouldn't be a problem.

35 | cal

March 19th, 2003 at 10:37 am

Avatar

as usual, you're marvelous, jennifer…i had forgotten they were dynamically generated. thanks so much!

36 | Jennifer

May 12th, 2003 at 1:20 pm

Avatar

slight modification to the code above so you can also tack on IPs to "hide" from as well…

<?
function isBadReferrer($ref, $ip) {
if (
(strstr($ref, "google.")) or
(strstr($ref, "aolsearch.aol.com")) or
(strstr($ref, "search.yahoo.com")) or
(strstr($ref, "search.msn.com")) or
(strstr($ref, "hotbot.com")) or
(strstr($ip, "123.456.7890")
/*
add more like the above line to add more "rejected" referrals. The "123.456.7890" is a dummy ip to show you how you enter in IPs…
*/
)
{
return true;
} else {
return false;
}
}

$ref = getenv("HTTP_REFERER");

if ($_SERVER['HTTP_X_FORWARD_FOR']) {
$ip = $_SERVER['HTTP_X_FORWARD_FOR'];
} else {
$ip = $_SERVER['REMOTE_ADDR'];
}

if (($ref) and (isBadReferrer($ref, $ip) )) {
print('<head><title>File Not Found</title></head><body><H1>File Not Found</h1><p>The requested URL was not found on this server.</p></BODY>');
exit;
}
?>

37 | titilayo

July 10th, 2003 at 7:00 pm

Avatar

I've been using this code and it's been working fine, but I've been wondering how I could get it to be even more foolproof. In doing a search to find out how the code Kim posted above would work, I came up with this solution.

In the code Jennifer posted, replace:

print('<head><title>File Not Found</title></head><body><H1>File Not Found</h1><p>The requested URL was not found on this server.</p></BODY>')

with

header("Location: http://www.google.com/")

so your whole code snippet will look something like:

<?
function isBadReferrer($ref)
{
if (
(strstr($ref, "google.")) or
(strstr($ref, "aolsearch.aol.com")) or
(strstr($ref, "search.yahoo.com")) or
(strstr($ref, "search.msn.com")) or
(strstr($ref, "hotbot.com"))
//add more like the above line to add more "rejected" referrals
)
{
return true;
}
else
{
return false;
}
}

$ref = getenv("HTTP_REFERER");
if (($ref) and (isBadReferrer($ref) )) {
header("Location: http://www.google.com/");

exit;
}
?>

This code wil re-direct anyone who comes to your page via google or any of the other listed search engines to the site you specify in the header tags (in this case I've made it http://www.google.com, but it can be anywhere you want). It works a charm — anyone who comes to my site via the "blocked" search engines is redirected back to the google home page.

38 | the country girl

September 23rd, 2003 at 7:30 am

Avatar

I was just curious if you could tack this code at the bottom of your cookiecheck.php if you have your blog skinned???

39 | susan

October 6th, 2003 at 8:44 pm

Avatar

I can't seem to get the code with the IP addresses working.. it keeps telling me I have a parse error…

40 | manuel

February 12th, 2004 at 7:28 pm

Avatar

hi, i've tried entering in any of the codes onto my blogspot.com page, and if when i use the last one listed on this page, it has zero effect; when i use the first one listed on this page, it prints the "error" message and my blog, no matter if i enter the address directly, or if found via a search engine.

right now, i have the "redirect to http://www.google.com" code on my page.

all help is appreciated… thanks, *M

41 | Jennifer

February 12th, 2004 at 7:33 pm

Avatar

I'm reasonably certain that if you're hosting your site on blogspot – they don't let you run server side scripts like PHP (which is what you need to do with this script)… May I recommend getting your own hosting account with Blogomania? 😀

42 | i1277

March 10th, 2004 at 8:23 pm

Avatar

Some nice links on your frontpage there!

Oh, by the way, I found this entry through a google search on "hide from google"…

43 | i1277

March 10th, 2004 at 8:32 pm

Avatar

Oh, you were probably speaking about another blog. Anyway, still nice links.

44 | Laura

April 10th, 2004 at 11:20 pm

Avatar

If there a way you could post that script in a way that I could paste it into my LiveJournal's code (on the S2 system)? Thanks.

45 | disappointed

April 19th, 2004 at 8:59 pm

Avatar

I found and came to this site from Google. Script does not work.
I don't have any program/firewall/whatever that would prevent sending the referrer (tested it on another site).

46 | Jennifer

April 19th, 2004 at 9:03 pm

Avatar

Actually it does work – I'm not using that script on THIS site.

I should note that there are A LOT of scripts posted on this site. VERY FEW of them are actually being used here. Mostly I'm just sharing information/scripts for other people

Featured Sponsors

Genesis Framework for WordPress

Advertise Here


  • Scott: Just moved changed the site URL as WP's installed in a subfolder. Cookie clearance worked for me. Thanks!
  • Stephen Lareau: Hi great blog thanks. Just thought I would add that it helps to put target = like this:1-800-555-1212 and
  • Cord Blomquist: Jennifer, you may want to check out tp2wp.com, a new service my company just launched that converts TypePad and Movable Type export files into WordPre

About


Advertisements