Showing posts with label robots.txt. Show all posts
Showing posts with label robots.txt. Show all posts

Thursday, August 4, 2011

Robots.txt

The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable.

Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard is different from, but can be used in conjunction with, Sitemaps, a robot inclusion standard for websites.

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.

The format is simple enough for most intents and purposes: a USERAGENT line to identify the crawler in question followed by one or more DISALLOW: lines to disallow it from crawling certain parts of your site.

The Basic structure of a robots.txt:
User-agent: *
Disallow: /

Google Blogger has introduce the robots.txt file in each blogger blog.
To check the robots.txt file of your blogger blog, just type the following URL in the adress bar of your browser.

http://www.yourblogname.blogspot.com/robots.txt
(replace yourblogname with your blog name).

Top Stories