Code Newbie
News     Forums     Search     Members     Sign Up    

My Code Newbie
Username

Password

Articles/Snippets
ASP Classic
ASP.NET
C
C#
C++
HTML / CSS
Java
Javascript
Linux / BSD
Perl
PHP
Python
Ruby
SQL
VB 6
VB.NET

C.N. Friends
  Planet Rome

Link to Us!
Code Newbie
  Code Newbie
    html
  » ROBOTS.txt -- Hiding files and dirs from search bot spiders
      by DavH27
 Page 1 of 1 
   

(Login to remove green text ads)
Do you ever wonder why Google sometimes returns some obscure pages such as custom error pages? Well it is because the web author has not uploaded a ROBOTS.txt file to their root directory!

Input the following relevant lines into a plain text editor such as notepad and be sure to save it as ROBOTS.txt .

You can stop all spiders from indexing the entire site (not recommended if you wish to be listed at all)
Code:
User-agent: * #This means all spiders Disallow: / # This means the entire site.
Code:
User-agent: Googlebot #Only Googlebot is specified to be disallowed Disallow: /secret #This stops the Googlebot from indexing any pages in the www.homepage.com/secret directory
Code:
User-agent: Googlebot User-agent: Roverdog Disallow: SecretAgents.html Disallow: /Porn/RudeStuff.php # This one shows that more then one spider can be specified. It also shows that individual files can be disallowed from indexing as well as whole directories. The file paths havwe to be relative, though.
You can find user agent names in your own logs by checking for requests to robots.txt. Most major search engines have short names for their spiders. Or, you can just find a list of bots.

Now save your file as ROBOTS.txt for the final time and uplaod it to your site's root directory. Unlike .htaccess files, you only need one of these files and it should be in the root directory.

I hope you have found this short tutorial easy to follow, useful and informative!

Please contact me in the forums if you have any problems with the above lines of code or have any suggestions or related feedback.




 
 Page 1 of 1 
   

Rate This Article
1 2 3 4 5 6 7 8 9 10





Copyright © 2000-2006, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting
Open Circle