» ROBOTS.txt -- Hiding files and dirs from search bot spiders by DavH27 |
|
(Login to remove green text ads)
Do you ever wonder why Google sometimes returns some obscure pages such as custom error pages? Well it is because the web author has not uploaded a ROBOTS.txt file to their root directory!
Input the following relevant lines into a plain text editor such as notepad and be sure to save it as ROBOTS.txt .
You can stop all spiders from indexing the entire site (not recommended if you wish to be listed at all)
Code:
User-agent: * #This means all spiders
Disallow: / # This means the entire site.
Code:
User-agent: Googlebot #Only Googlebot is specified to be disallowed
Disallow: /secret #This stops the Googlebot from indexing any pages in the www.homepage.com/secret directory
Code:
User-agent: Googlebot
User-agent: Roverdog
Disallow: SecretAgents.html
Disallow: /Porn/RudeStuff.php
# This one shows that more then one spider can be specified. It also shows that individual files can be disallowed from indexing as well as whole directories. The file paths havwe to be relative, though.
You can find user agent names in your own logs by checking for requests to robots.txt. Most major search engines have short names for their spiders. Or, you can just find a list of bots.
Now save your file as ROBOTS.txt for the final time and uplaod it to your site's root directory. Unlike .htaccess files, you only need one of these files and it should be in the root directory.
I hope you have found this short tutorial easy to follow, useful and informative!
Please contact me in the forums if you have any problems with the above lines of code or have any suggestions or related feedback.
|
|