Robots.txt files are text files used to give instructions to web crawlers about how to crawl the pages of a website. Robots.txt files are sometimes useful for SEO, but they must be handled with care. Improper implementation can cause major problems, such as inadvertently preventing the entire website from being crawled.
For What Is Robots.txt Used?
Robots.txt files can be applied to any web crawler, not just Google. In fact, these files can be aimed at just Google crawlers, all crawlers other than Google or any combination of crawlers in between.
That said, you can get a feel for real world applications for robots.txt files by taking a look at Google’s suggestions for using them. Google suggests using robots.txt for:
- Disallowing crawling of the entire website
- Disallowing crawling of a directory and its contents
- Allowing access to a single crawler
- Allowing access to all but a single crawler
- Disallowing crawling of a single web page
- Blocking a specific image from Google images
- Blocking all images on a site from Google images
- Disallowing crawling of files of a specific file type (e.g., GIF files)
- Disallowing crawling of the entire site, but showing AdSense ads on the website’s pages
- Matching URLs that end with a specific string (e.g., .xls)
Robots.txt files are useful for cordoning off certain pages or groups of pages from crawlers, and for preventing crawlers from indexing certain types of files, such as images.
In addition, robots.txt files can be used to delay crawling on a site, which can be very helpful for big sites that draw a lot of traffic — delaying crawls can reduce or prevent server overloads that slow down page loading or cause a site to crash.
What Is Robots.txt Used for in SEO?
Robots meta directives, different from robots.txt, are usually a better approach for SEO purposes when the goal is to prevent web pages from being crawled or preventing links from being followed. Robots meta directives are interpreted as stronger commands, and are quite a bit easier to implement.
Additionally, if robots meta directives are improperly executed, the damage is less severe, generally, than what can occur with mishandled robots.txt files. This is because robots meta directives are set up page by page, whereas robots.txt commands apply to how the crawler should deal with the entire website or an entire directory.
Meta directives are also a more efficient means of preventing duplicate content from being indexed, although there are exceptions to this. The advantage of meta directives in resolving duplicate content issues is that the duplicate content is still crawled. When robots.txt is used to block duplicate content, that content is not crawled. This is an issue because Google wants access to crawl the entire website, as it believes it can spot errors and index pages that might otherwise be missed.
Whether you use robots.txt or robots meta directives to instruct crawlers, note that you should use one or the other. Using both would be redundant.
Potential SEO problems With Robots.txt
One of the problems with robots.txt files for SEO is that if you use them to block a page or set of pages and links on that page or set of pages will not be followed, any link value will be lost. Not only that, but if the only links to a page are on the blocked page(s), then the linked-to page(s) will not be indexed.
Another issue with robots.txt — one that often applies to meta directives as well — is that crawlers do not have to follow them. Crawlers visiting a site with malicious intent will ignore them; thus, you should not and cannot use robots.txt or meta directives to protect sensitive data on your site. Meta directives are not always a safe way to protect data; a better approach is to password-protect sensitive data (such as private user information).
Many websites have no need of robots.txt files at all, and it is best to avoid them unless you have a specific need that cannot be addressed better in another fashion. Make sure you partner with an SEO-savvy web developer to help make sure robots.txt files are set up properly and implemented only when necessary.