Google, Google, Go away

Since my journal is of a personal nature - I don’t really want it indexed on Google. I had previously posted about a way to “temporarily” remove your site from their index - unfortunately, it DOES time out! I’ve done this a number of times, and… well… it’s BAAACKK.

If you can’t beat ‘em - pretend you don’t exist, and maybe they’ll go away.

I’ve now added code to my blog so that if you get to it from a google search you get a “file not found” page. For those of you wishing to hide from Google - here’s how I did it:

At the top of (every) page (obviously - this is done as an include) - BEFORE any <html> tags I have this:

<?
$itsagoogle = ‘google.’;
$ref = getenv(”HTTP_REFERER”);
if (($ref) and (strstr($ref, $itsagoogle)) ) {
print(’<head><title>File Not Found</title></head><body><H1>File Not Found</h1><p>The requested URL was not found on this server.</p></BODY>’);
exit;
}
?>

That’s it! Now if you do a search in google, and my site comes up - you’ll get that generic “file not found page”. I’m all for simple solutions!!!

(Standard disclaimer: This requires that you can run php on your page and server)

update: For those of you who used this code previous to 11:00pm on 5/18 please note I made a slight change so that ANY google referrer would be blocked (ie. the original script didn’t work if they came from a www.google.ca search) but now it’s fixed…

Update 3/16/03: Ron had a hack elsewhere that would work here if you’d like to add bad more rejected referrers in addition to Google. With his hack, here’s how it goes: (this should be one of the first things on your page)

<?
function isBadReferrer($ref)
{
if (
(strstr($ref, “google.”)) or
(strstr($ref, “aolsearch.aol.com”)) or
(strstr($ref, “search.yahoo.com”)) or
(strstr($ref, “search.msn.com”)) or
(strstr($ref, “hotbot.com”))
//add more like the above line to add more “rejected” referrals
)
{
return true;
}
else
{
return false;
}
}
$ref = getenv(”HTTP_REFERER”);
if (($ref) and (isBadReferrer($ref) )) {
print(’<head><title>File Not Found</title></head><body><H1>File Not Found</h1><p>The requested URL was not found on this server.</p></BODY>’);
exit;
}
?>

46 Responses to “Google, Google, Go away”

  1. Gina Says:

    Thank you Jenn! I’ve had this same problem and am so tired of resubmitting to have my site not listed. This is going to be great!

  2. Jennifer Says:

    Just so you know - this is still only a “superficial fix”, as anyone intent on getting to your site can just manually type in your url. hit enter. (Then hit refresh if neccessary) and they’re on. But, of course, they’d have to KNOW to do that! LOL! ;-)

  3. Row Says:

    That’s a good little trick! Especially if you have a blog listed on the Australian NineMSN blog directory. I don’t know who wrote those descriptions, but they’re most unflattering!

    Hopefully now that I’ve changed domains and added meta info, I won’t get spidered at all. *crosses fingers*

  4. Pete Says:

    It should also be noted that a better solution (IMHO) is to never be indexed in the first place. How? Most (google included) index-bots follow the robots.txt standard.

    Info on Google’s bot can be found here: http://www.google.com/bot.html

    Info on the robots.txt standard can be found here:
    http://www.robotstxt.com

    In theory, there are three or four pages on my site which should not be indexed by search engines, and after almost 1,000 search referrals, they’ve never been hit. One of them is the page of “interesting search referrals” so this does work for google, if done correctly. :)

  5. Jennifer Says:

    Pete - I’ve done that… but I still get indexed :(

  6. Jennifer Says:

    Here’s the link on this site

  7. Jade Says:

    Google ignored my robots.txt, too, and I know of at least one other person who it ignored it for as well. Most annoying!

  8. Lynda Says:

    This is a great little trick Jennifer!! It could also be used if some particular person or site was linking to you and you didn’t want the traffic, you could shut them out and take up considerably less bandwidth. That doesn’t happen all too often, but I’m sure it happens sometimes (I know it’s happened to me before) so perhaps this would be a thought for that as well.

    As far as google goes, I must be a lucky one. On my posh site, I just put a noindex, nofollow meta tag - I haven’t been crawled by google on Posh for months now.

  9. jess Says:

    this would work great for blocking that stupid “portal of evil” site or whatever it’s called. wish i had known about this a while back.

    i finally got google to stop indexing me by using this text in my robots.txt file:

    User-agent: *
    Disallow: /
    User-agent: DittoSpyder
    User-agent: Googlebot
    Disallow: /

  10. Gina Says:

    I’m curious. Can you have TOO many parts of text in your .htaccess code? Will one override the other and etc. or possibly cause the blocking text to not work completely?

  11. Jennifer Says:

    I have only a few lines in my .htaccess, and now only the text above in my robots.txt file… don’t think that’s the problem.

    I once read somewhere that some of the “keyword to link” associations that Google does is based on how often throughout the net a particular (keyword) is used as the link to a page…

    So in this case, my name: I leave a lot of comments on people’s blogs - and on that page, my name is linked to my site. Therefore if you do a search for “Jennifer”, Google knows that “Jennifer” is very often linked to “www.scriptygoddess.com”…

    The article I read talked about how you could use that to play a prank on someone. Let’s say your friend’s site is http://www.joe.com. If on your page, every where you used the (keyword) “jerk” you linked to your friends site, and you asked a ton of friends to do the same - Google doesn’t even have to spider http://www.joe.com... from it’s spidering of OTHER pages, it draws the association of jerk and http://www.joe.com... so you do a search on the keyword “jerk” and up will pop your friends site…

    Wish I could find that article…

  12. Jennifer Says:

    …more proof that THAT is what’s going on here… You’ll notice any search that returns my site, the “cached” option is not available… it’s because Google IS NOT spidering my site, but that doesn’t fix my problem of not coming up on searches.

  13. Gaile Says:

    Interesting piece of script. I recently discovered that several people found me by typing my name into google and hitting the “I feel lucky” option. I don’t mind being found, but that is a little *too* easy, since there are people that I really rather didn’t know I have a weblog.

  14. Richard Says:

    A question/suggestion: How about sniffing out the IP of Google’s crawler (which I believe has ‘googlebot’ in it) instead of the referer? This would make sure that the Google cache gets a ‘bogus’ copy (for those nefarious types, like myself, who sometimes look at the cached copy of a site). Also, to mask that they used Google, they could simply copy the URL into the clipboard and paste it into a different browser (another favourite technique of mine when I want to mask what I’m searching for).

  15. robert Says:

    Jennifer, that google trick you were talking about is called a google bomb and the article is here.

  16. Jennifer Says:

    Robert - Yup that’s the article!! Thanks for the link! :D
    Richard - re: crawler… see the comments above: My site isn’t actually being crawled, and there isn’t a cached version of my site available on Google. As for people copying and pasting the URL… if they’re going to that extreme to see my site, then fine. I think most people will see the “page not found” and move along. I’m not trying to block everyone, just random people hitting my site, who aren’t really interested in blogs in the first place.

  17. mark Says:

    There’s a nice little tutorial on using robots.txt at http://www.searchengineworld.com/robots/robots_tutorial.htm

  18. Selena Says:

    Hi,
    I was looking for MT hacks in google and found you site here
    http://www.google.com/search?hl=en&lr=&ie=UTF8&oe=UTF8&q=moveable+type+hacks
    Thought you might like to know..

    Selena

  19. Jennifer Says:

    Selena - that brings up scriptygoddess. That’s okay. I was hiding another site. That’s where I’m using the code.

  20. jess Says:

    jenn, i modified your code to look for the term “search” within the referrer link… this has enabled me to block out many other search engines, such as hotbot, msn, and altavista. :) thanks for the code!!!

    <?
    $itsasearch = ’search’;
    $ref = getenv(”HTTP_REFERER”);
    if (($ref) and (strstr($ref, $itsasearch)) ) {
    print(’<head><title>File Not Found</title></head><body><H1>File Not Found</h1>The requested URL was not found on this server.<p></BODY>’);
    exit;
    }
    ?>

  21. Gregory Says:

    If the robots.txt wasn’t working right for you, the other possiblity is to use the META tags for such things. Here’s a site I found that has some good data on it: http://www.ceebanff.ca/help/tags/

  22. Elisa Says:

    I found a link directly off the Google’s site on how to remove content from their indexes. There are various methods, but basically:

    “If you want to prevent all robots from indexing individual pages on your site, then you can place the following meta tag element into the page’s HTML code:

    If you want to allow other robots to index individual pages on your site, preventing only Google’s robots from indexing the pages, use the following tag:

    More information on this standard meta tag element is available here: http://www.robotstxt.org/wc/exclusion.html#meta.”

    Word for word from that link - which by the way, I can’t remember where I found on the Google site, but that I fortunately saved to my hardrive the last time. :)

  23. Elisa Says:

    Oh. Here’s the link: http://www.google.com/remove.html :)

  24. Jennifer Says:

    Elisa - the only problem is that I was still getting listed in Google searches, even after doing everything they said on the page…

    Read through the comments above for the explanation why…

  25. scott Says:

    I modified Jennifer’s script by adding additional conditionals to check for more than one search engine, thusly:

    <?
    $google = ‘google.’;
    $altavista = ‘altavista.’;
    $ref = getenv(”HTTP_REFERER”);
    $goaway = ‘404 Not FoundFile Not
    FoundThe requested URL was not found on this server.

    ‘;

    if (($ref) and (strstr($ref, $google)) ) {
    print($goaway);
    exit;
    } elseif (($ref) and (strstr($ref, $altavista)) ) {
    print($goaway);
    exit;
    }
    ?>

    This can be expanded infinitely, although Jess’ substitution of “search” for “google” in Jennifer’s original script may be the most effective method to combat the bots (aside from not getting indexed in the first place).

  26. eve Says:

    I think i’m going to have to add this to my trackback pages because that’s all google wants to index it seems.

  27. hmw Says:

    Fantastic!!! I have removed myself from google so many times….I can’t even bear the thought of doing it anymore! I have my robots.txt set up and meta tags to turn them away but no luck - this is amazing!

    Now that I don’t work in the clin lab anymore, I don’t really give a crap if anyone can find my site but I can’t deal with the pervs that hit on Brittany’s site…..that’s why that one is password protected now.

    Thank you so much for this :-)

  28. Christine Says:

    Thank you, thank you, thank you for this script!

  29. carol Says:

    Can I use this tags with blogger -pro?

  30. Kim Says:

    A faster method (this code must be placed before all HTML code (top of the file)):

    <?php

    $blocked = Array(”google”, “search”);
    $ref = getenv(”HTTP_REFERER”);

    if(in_array($ref,$blocked))
    header(”HTTP/1.0 404 Not Found”);

    ?>

  31. Annessa Says:

    This is WONDERFUL! It works great on my site. I do have a question, though, what would I need to do to get the site in question here (http://www.geekgrrl.com/archives/001891.php) to quit indexing me. Is there a way to do it using this script?

  32. Jennifer Says:

    I think the only way to do that is using .htaccess: see here

  33. cal Says:

    this is great…i have put in the revised script. however, some of the code is showing up on my comment popup pages. if you go to my site and click on the comment link, you’ll see what i mean. it’s happening in comment preview as well. odd.

  34. Jennifer Says:

    You can not put this code on typical pop up comments because it they are a CGI page and therefore can not process PHP code. As far as I know - those pages are being generated dynamically - so I don’t think Google or other search engines can index them - so it shouldn’t be a problem.

  35. cal Says:

    as usual, you’re marvelous, jennifer…i had forgotten they were dynamically generated. thanks so much!

  36. Jennifer Says:

    slight modification to the code above so you can also tack on IPs to “hide” from as well…

    <?
    function isBadReferrer($ref, $ip) {
    if (
    (strstr($ref, “google.”)) or
    (strstr($ref, “aolsearch.aol.com”)) or
    (strstr($ref, “search.yahoo.com”)) or
    (strstr($ref, “search.msn.com”)) or
    (strstr($ref, “hotbot.com”)) or
    (strstr($ip, “123.456.7890″)
    /*
    add more like the above line to add more “rejected” referrals. The “123.456.7890″ is a dummy ip to show you how you enter in IPs…
    */
    )
    {
    return true;
    } else {
    return false;
    }
    }

    $ref = getenv(”HTTP_REFERER”);

    if ($_SERVER['HTTP_X_FORWARD_FOR']) {
    $ip = $_SERVER['HTTP_X_FORWARD_FOR'];
    } else {
    $ip = $_SERVER['REMOTE_ADDR'];
    }

    if (($ref) and (isBadReferrer($ref, $ip) )) {
    print(’<head><title>File Not Found</title></head><body><H1>File Not Found</h1><p>The requested URL was not found on this server.</p></BODY>’);
    exit;
    }
    ?>

  37. titilayo Says:

    I’ve been using this code and it’s been working fine, but I’ve been wondering how I could get it to be even more foolproof. In doing a search to find out how the code Kim posted above would work, I came up with this solution.

    In the code Jennifer posted, replace:

    print(’<head><title>File Not Found</title></head><body><H1>File Not Found</h1><p>The requested URL was not found on this server.</p></BODY>’)

    with

    header(”Location: http://www.google.com/“)

    so your whole code snippet will look something like:

    <?
    function isBadReferrer($ref)
    {
    if (
    (strstr($ref, “google.”)) or
    (strstr($ref, “aolsearch.aol.com”)) or
    (strstr($ref, “search.yahoo.com”)) or
    (strstr($ref, “search.msn.com”)) or
    (strstr($ref, “hotbot.com”))
    //add more like the above line to add more “rejected” referrals
    )
    {
    return true;
    }
    else
    {
    return false;
    }
    }

    $ref = getenv(”HTTP_REFERER”);
    if (($ref) and (isBadReferrer($ref) )) {
    header(”Location: http://www.google.com/“);

    exit;
    }
    ?>

    This code wil re-direct anyone who comes to your page via google or any of the other listed search engines to the site you specify in the header tags (in this case I’ve made it http://www.google.com, but it can be anywhere you want). It works a charm — anyone who comes to my site via the “blocked” search engines is redirected back to the google home page.

  38. the country girl Says:

    I was just curious if you could tack this code at the bottom of your cookiecheck.php if you have your blog skinned???

  39. susan Says:

    I can’t seem to get the code with the IP addresses working.. it keeps telling me I have a parse error…

  40. manuel Says:

    hi, i’ve tried entering in any of the codes onto my blogspot.com page, and if when i use the last one listed on this page, it has zero effect; when i use the first one listed on this page, it prints the “error” message and my blog, no matter if i enter the address directly, or if found via a search engine.

    right now, i have the “redirect to http://www.google.com” code on my page.

    all help is appreciated… thanks, *M

  41. Jennifer Says:

    I’m reasonably certain that if you’re hosting your site on blogspot - they don’t let you run server side scripts like PHP (which is what you need to do with this script)… May I recommend getting your own hosting account with Blogomania? :D

  42. i1277 Says:

    Some nice links on your frontpage there!

    Oh, by the way, I found this entry through a google search on “hide from google”…

  43. i1277 Says:

    Oh, you were probably speaking about another blog. Anyway, still nice links.

  44. Laura Says:

    If there a way you could post that script in a way that I could paste it into my LiveJournal’s code (on the S2 system)? Thanks.

  45. disappointed Says:

    I found and came to this site from Google. Script does not work.
    I don’t have any program/firewall/whatever that would prevent sending the referrer (tested it on another site).

  46. Jennifer Says:

    Actually it does work - I’m not using that script on THIS site.

    I should note that there are A LOT of scripts posted on this site. VERY FEW of them are actually being used here. Mostly I’m just sharing information/scripts for other people