summerwilkins
Wed 18 May 2011, 08:51 pm GMT +0200
Robots.txt is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that.
It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site i.e. it is not a firewall, or a kind of password protection and the fact that you put a robots.txt file is something like putting a note ???Please, do not enter?? on an unlocked door ??? e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too na??ve to rely on robots.txt to protect it from being indexed and displayed in search results.
It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site i.e. it is not a firewall, or a kind of password protection and the fact that you put a robots.txt file is something like putting a note ???Please, do not enter?? on an unlocked door ??? e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too na??ve to rely on robots.txt to protect it from being indexed and displayed in search results.