In blogging, Bloggers always here a word robots.txt, and they always wanted to optimize robots.txt file, But they skip because of some confusing commands.
Do you know: If a blogger wants any traffic to his/her website then their page should follow some crucial SEO factors. If a blogger wants to rank well in google and other search engines they have to do some work with robots.txt, so we must learn how to optimize robots.txt file.
Why should we know about robots.txt?
The strong reason is that it is the essential factor in SEO. This robots.txt plays a major role in your web page ranking. Depending on robots.txt instruction a WebCrawler crawl’s your site.
How to optimize robots.txt :
First of all to know how to optimize robots.txt you must some basics, about the robots.txt. And also Implementing perfect robots.txt will boost your web page rank. So let me not waste much time and directly go into the topic in detail. i.e. all about robots.txt and how to optimize robots.txt.
Firstly, let me explain
What is robots.txt ??
Robots.txt file is a set of instruction which informs Googlebot or other search bots whether to scan the web page or a particular content on that web page. These files are usually placed on the web server.
Before Googlebot crawl’s your site, it checks whether there are any limitations provided. It usually searches for instruction, so that it can access the file or not. But you are thinking, how to code up those instructions?
Well, be patient… I will be explaining… Before going to instruction part and optimize robots.txt, you should know some basics of robots.txt.
Basics of robots.txt:
To implement a robots.txt file, it is a must to make use of these following commands.
This is a default robots.txt file. It permits Googlebot to access your site completely. This particular code usually appears on most of the web pages. And it is Default Robots.txt file in WordPress.
In this file, you are completely blocking your site, just by providing a slash. Simply you are not permitting Googlebot to access your file.
- Now if you want to a block a particular folder but not the complete page, then follow this code.
User-agent:* Disallow: /music/
In this code, you are letting Googlebot access everything except music folder.
Note: I have used music in this example to make you understand easily. You can replace with any folder you wish.
- If you want you can block a file (not folder) also.
User-agent:* Disallow: /file5.html
By providing above command, you are not giving accessing right to Googlebot.
Until now in the basics of the robots.txt file, you have gone through disallowing instructions. Now you have an allow command also.
Here it is:
- User-agent:*Disallow: /imagesAllow: /images/fusion.jpg
Now, here you can see the evaluation clearly. As I explained earlier this command.
Means it is blocking image folder. But thereafter adding another command i.e. Allow:/images/fusion.jpg
You are permitting only to access the fusion image in images folder.
A sitemap can be placed anywhere in the robots.txt instruction. Mostly it is placed either at the top or bottom of the instruction.
It is coded as: Sitemap: https://www.myvash.com/sitemap.xml.
- User-agent: bingbot
It implements the time interval i.e. 60sec between the crawling requests to the server. So, these are the most important basics of robots.txt and every blogger should know it for a good ranking.
NOTE: GOOGLEBOT IS A KIND OF WEB CRAWLER. THERE ARE VARIOUS OTHER WEBCRAWLER’S. BINGBOT IS ALSO A KIND OF WEB CRAWLER.
Learning the basics is not enough; you should know what do those commands really mean. I feel that it is necessary for you to know the definition used in the commands and I deeply recommend you please go through this part or Else you will be losing something.
Here it is:
Definition of the commands:
User agent can be said as a term which is used to specify the direction of web crawler’s (Googlebot and various other crawling bots)
- User-agent: *
If you want to provide direction for all the robots, then use * after the command “user-agent”. It informs that the user-agent is providing direction to every WebCrawler’s (Googlebot and other bots).
- User-agent: Googlebot
Here the direction is applied specifically to Googlebot or Google Adsense or you can use any other bot.
- Disallow: As you can already know the word meaning disallow.
This Disallow part simply says that which part they must not see.
For ex: if I wanted google bot to access my web page, but I don’t want to access my specific image or folder then you can simply exclude it.
By disallow rule.
Ex: I don’t want image name: fusion then just disallow it
User-agent: * Disallow: /Fusion.jpg
Now, this is saying google bot, “User-agent: *” this is applied to all robots and “disallow:/fusion.jpg” say not to visit this particular image name fusion.
Obviously, it allows accessing the particular file of a folder which you want to make it access. This command will be applied when you have a blocked folder and you don’t want a specified file in the particular folder to be blocked.
Sitemap: Nowadays sitemap is another important rule which is used in robots.txt. This sitemap defines the location of our website. By using this command, our site will be visible. A sitemap can be placed anywhere in the robots.txt file.
Generally, the sitemap is available either at the upper or lower part of the robots.txt file.
Ex: let me consider my website as an example: If I want to show my sitemap then I should use the sitemap in robots.txt as follow:
Sitemap: https://www.myvash.com/sitemap.xml Sitemap: https://www.myvash.com/sitemap_index.xml
Crawl-delay is a non-standard rule which we use. This crawl delay defines the time (usually in seconds) difference between the requests of the server.
Most of the search engines directly support crawl-delay. This allows to percept (dictate) the number of seconds between requests on the server for a specific user-agent.
I hope that you are clear enough regarding the commands and their definitions. So make a decision whether to leave it as default or add limitations. Now it is about a place or platform where you have to code it.
Where to build up a robots.txt file?
Robots.txt is a file which can be easily created using any text editors like notepad. We can also copy paste from other sources. So isn’t this easy to build up a robots.txt file.
Do’s And Don’t of Commands in Robots.txt:
Here you will be able to learn the do’s and don’t of the commands Or the perfect way of using commands to avoid errors.
Here is the list of do’s and don’t in infographics:
Commenting in robots.txt:
It might happen that we get confused about the coding that we have done in the past instructions. Sometimes it may even happen that we forget the reason for the code we have applied. So for our convenience, we may comment or we can say a remembrance step.
This commenting can be considered as a definition of the code. This comment is not considered by WebCrawler or bots but if we implement it will be helpful.
Now see the implementation here:
Simply as other coding languages, here also, we apply a “#” symbol.
A comment can be placed at the top or the last line of the individual code.
Ex 1. #blocks the image folder from crawling. User-agent:googlebot Disallow:/image Ex 2. User-agent:googlebot Disallow:/image #hide the image folder.
Oh!! I think I am able to clear the doubts you were having in your mind. Please be patience, finally, I have one last thing that is testing.
Testing your robots.txt file:
After all the implementations of coding, it is time for testing. You should test whether any file or text is affected by using robots.txt. You can test by moving to Google search console.
Here, you can sign in with any of your google accounts.
Go to Dashboard — — you can see the menu on the left side then go to Crawl Option — — robots.txt tester.
Then it will show you robots.txt tester where you can check any errors of the robots.txt:
There are also various online testing tools available. you can check them on google.
If you want to see any one’s robots.txt file then just type their website followed by /robots.txt
Just type any website and add “/robots.txt” it will be displayed. if you want to take a look at my robots.txt then here: myvash.com/robots.txt.
So just start creating your robots.txt file or you can just edit my robots.txt or anyone’s but just see what’s important to you.
Or if wanted optimized robots.txt then here it is
Here is the optimize robots.txt where it contain sitemap for faster indexing and all the basic and best search optimized factor of SEO.
you can include your sitemap either in the start or in the end. you can add comments.
Sitemap: https://www.myvash.com/sitemap.xml sitemap: https://www.myvash.com/sitemap-image.xml User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /recommended/ Disallow: /comments/feed/ Disallow: /trackback/ Disallow: /index.php Disallow: /xmlrpc.php Disallow: /wp-content/plugins/ User-agent: NinjaBot Allow: / User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: /
REMINDER: WHEN YOU COPY A ROBOTS.TXT FILE FROM OTHERS JUST SEE YOU REMOVE THEIR DOMAIN NAME AND WRITE YOUR’S AND ALSO CHECK WHETHER IT IS USEFUL FOR YOU OR NOT. HERE, IN THIS CASE, YOU MUST REMOVE “MYVASH” AND ADD YOUR DOMAIN NAME
Your robots.txt file shouldn’t cross 200 disallow lines. Initially, start with a few disallows lines. If you wish to add a few more disallow lines then add it later. And before submitting just takes care of your instruction and check it.
Because if once you disallow a link and want to regain the access then you have to wait for 3 months. Robots.txt file is a very important file for Google.
Google uses robots.txt as an authoritative set of links to disallow.
Sometimes if you disallow a link then it may happen that google removes it completely from the search engine. So be careful while implementing a disallow code.
I am concluding this article here. Thanks for being so patient and the zeal you have shown to learn this particular article about optimize robots.txt. I hope it really helped you.
If you have any queries or any confusion regarding the optimize robots.txt or sub-topics. Please don’t hesitate and Contact Us. I would be definitely helping you. You can contact me from our contact us form or just drop a comment.
Thank you!! Enjoy browsing and sharing!!