What a WordPress robots.txt File Is (And Why You Need One)
When you create a new website, search engines will send their minions (or bots) to ‘crawl’ through it and make a map of all the pages it contains. That way, they’ll know what pages to display as results when someone searches for related keywords. At a basic level, this is simple enough.
The problem is that modern websites contain a lot more elements than just pages. WordPress enables you to install plugins, for example, which often come with their own directories. You don’t want these to show up in your search engine results, however, since they’re not relevant content.
What the robots.txt file does is provide a set of instructions for search engine bots. It tells them: “Hey, you can look here, but don’t go into those rooms over there!” This file can be as detailed as you want, and it’s rather easy to create, even if you’re not a technical wizard.
In practice, search engines will still crawl your website even if you don’t have a robots.txt file set up. However, not creating one is inefficient. Without this file, you’re leaving it up to the bots to index all your content, and they’re so thorough that they might end up showing parts of your website you don’t want other people to have access to.
More importantly, without a robots.txt file, you’ll have a lot of bots crawling all over your website. This can negatively impact its performance. Even if the hit is negligible, page speed is something that should always be at the top of your priorities list. After all, there are few things people hate as much as slow websites (and that includes us!).
Where the WordPress robots.txt File Is Located
When you create a WordPress website, it automatically sets up a virtual robots.txt file located in your server’s main folder. For example, if your site is located at yourfakewebsite.com, you should be able to visit the address yourfakewebsite.com/robots.txt, and see a file like this come up:
[snippet slug=robot-txt1 lang=abap]
This is an example of a very basic robots.txt file. To put it in human terms, the part right after User-agent: declares which bots the rules below apply to. An asterisk means the rules are universal and apply to all bots. In this case, the file tells those bots that they can’t go into your wp-admin and wp-includes directories. That makes a certain amount of sense since those two folders contain a lot of sensitive files.
However, you may want to add more rules to your own file. Before you can do that, you’ll need to understand that this is a virtual file. Usually, the WordPress robots.txt location is within your root directory, which is often called public_html or www (or is named after your website):
However, the robots.txt file WordPress sets up for you by default isn’t accessible at all, from any directory. It works, but if you want to make changes to it, you’ll need to create your own file and upload it to your root folder as a replacement.
We’ll cover several ways to create a new robots.txt for WordPress in a minute. For now, though, let’s talk about how to determine what rules yours should include.
What Rules to Include in Your WordPress robots.txt File
In the last section, you saw an example of a WordPress-generated robots.txt file. It only included two short rules, but most websites set up more than that. Let’s take a look at two different robots.txt files, and talk about what they each do differently.
Here is our first WordPress robots.txt example:
[snippet slug=robots-txt lang=abap]
This is a generic robots.txt file for a website with a forum. Search engines will often index each thread within a forum. Depending on what your forum is for, however, you might want to disallow it. That way, Google won’t index hundreds of threads about users making small talk. You could also set up rules indicating specific sub-forums to avoid, and let search engines crawl the rest of them.
You’ll also notice a line that reads Allow: / at the top of the file. That line tells bots that they can crawl through all of your website pages, aside from the exceptions you set below. Likewise, you’ll notice that we set these rules to be universal (with an asterisk), just as WordPress’ virtual robots.txt file does.
Now let’s check out another WordPress robots.txt example:
[snippet slug=robots-txt-3 lang=abap]
In this file, we set up the same rules WordPress does by default. However, we also added a new set of rules that block Bing’s search bot from crawling through our website. Bingbot, as you might imagine, is the name of that bot.
You can get pretty specific about which search engine’s bots get access to your website, and which ones don’t. In practice, of course, Bingbot is pretty benign (even if it’s not as cool as Googlebot). However, there are some malicious bots out there.
The bad news is that they don’t always follow the instructions of your robots.txt file (they are rebels, after all). It’s worth keeping in mind that, while most bots will follow the instructions you provide in this file, you aren’t forcing them to do so. You’re just asking nicely.
If you read up on the subject, you’ll find a lot of suggestions for what to allow and what to block on your WordPress website. However, in our experience, fewer rules are often better. Here’s an example of what we recommend your first robots.txt file should look like:
[snippet slug=robots-4 lang=abap]
Traditionally, WordPress likes to block access to the wp-admin and wp-includes directories. However, that’s no longer considered a best practice. Plus, if you add metadata to your images for Search Engine Optimization (SEO) purposes, it doesn’t make sense to disallow bots from crawling that information. Instead, the above two rules cover what most basic sites will require.
What you include in your robots.txt file will depend on your specific site and needs, however. So feel free to do some more research on your own!