Saturday, December 29, 2012

How To Create And Use "ROBOTS.TXT" file ?


What is a ROBOTS.TXT file ?

 Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit.It restricts access to your site by search engine robots that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall).


To see which URLs Google,Bing  or other has been blocked from crawling, visit the Blocked URLs page of the Health section of Webmaster Tools or simply enter the URL into search bar.

Why you need it ?

You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file not even an empty one.

In order to use a robots.txt file, you'll need to have access to the root of your domain . If you don't have access to the root of a domain, you can restrict access using the robots meta tag.
example -

To prevent all robots from indexing a page on your site, place the following meta tag into the <head> section of your page:
<meta name="robots" content="noindex">

How To Create a robots.txt file ?


The simplest robots.txt file contains following:
  • User-agent: the robot the following rule applies to
  • Disallow: the URL you want to block
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.


You can include as many entries as you want. You can include multiple Disallow lines and multiple user-agents in one entry.
Each section in the robots.txt file is separate and does not build upon previous sections.
For example:
User-agent: *
Disallow: /private_file.html

User-Agent: *
Disallow: /junk-directory/

Blocking user-agents

The Disallow line allows to block the site contents. You can list a specific URL ,directory and many others. The entry should begin with a forward slash (/).
  • To block the entire site
    Disallow: /
    
    
  • To block a directory and everything in it
    Disallow: /junk-directory/
  • To block a page
    Disallow: /private_file.html
    
    
    To block access to all sub directories that begin with private
    User-agent: Googlebot
    Disallow: /private*/
    
    
    Firstly Save your robots.txt and the file should be saved  to the highest level directory of your site. The robots.txt file must reside in the root of the domain and must be named "robots.txt".
    If you have any problem with blospot then go for the article.
    Having problem in blogger/blogspot ? 
    
    

1 comment: