We will discuss robots.txt tips and tutorial in this post. Use power of robots txt file and guide/control search engine crawlers (spiders or robots). Your website should have a robots.tx file located at the root of your website so that it can be accessed as http://example.com/robots.txt or http://www.example.com/robots.txt.
You can guide search engine crawlers to index or not to index certain pages, web directories or URL paths etc with the help of robots.txt file.
How to create/generate robots.txt file
You can simply create a basic robots txt file or you can also generate it through Google webmaster tools.
1. Open a new txt file using notepad.
2. Write following code. The below code is used to allow all website pages for crawling.
User-agent: *
Allow: /
or
User-agent: *
Disallow:
3. Click save and use file name as robots.
4. Upload this file to your website root folder
5. Browse this file using path http://www.example.com/robots.txt or http://example.com/robots.txt whatever your preferred web address.
6. Now test your robots.txt file in Google Webmaster tools.
When you encounter 500 or 404 error on accessing this file then contact your webmaster or website developer.
Robots.txt Tips for SEO
Allow all webpages for crawling
User-agent: *
Allow: /
or
User-agent: *
Disallow:
Disallow specific path or folder for crawling
User-agent: *
Disallow: /folder
Robots.txt Wildcard Matching
Disallow query string URLs or extensions.
Disallow all URLs with query string
User-agent: *
Disallow: /*?
Disallow all URLs which ends with .asp
User-agent: *
Disallow: /*.asp$
Robots.txt Advanced Tips
If you have very large website then you can use crawl delay function so that crawlers may not harm your website performance. Although you can use this feature in Google webmaster tools and set your website crawl priority.
Example –
User-agent: Googlebot
Crawl-delay: 10
Where value 10 is in seconds
Write different rules for different crawlers
Example –
User-agent: *
Disallow: /folder1
Disallow: /folder2
User-agent: Googlebot
Disallow: /folder3
User-agent: bingbot
Disallow: /folder4