purpose of robots.txt file for SEO

The purpose of a robots.txt file

4 min to read

What is a robots.txt file?

robots.txt is a textual file that serves the purpose of telling search engines which parts of your website they can crawl. Search engines deploy so-called “spiders” and “bots” that crawl the web, reading and indexing pages they come upon. When they reach a specific website, they look for the robots.txt file and information relating to what they should crawl and what they should ignore.

How does the robots.txt work?

Spiders and bots travel across the web by following links from website A to B to C, and so on. When they reach the robots.txt, sometimes also called “Robots Exclusion Protocol”, they use it to look up which pages they are allowed to crawl and index.

Are all bots required to obey the information from robots.txt?

Some spiders (bots) can opt to ignore robots.txt. These are usually malware bots, spamming bots or email scrapers. The majority of bots from established search engines such as Google and Yahoo will adhere to the instructions in the file.

Do I need a robots.txt on my website?

If you want search engines to index your entire website, and there is nothing you want to block access to, then you don’t need to bother with robots.txt at all. When a bot reaches a particular website and does not find the robots.txt file, it will simply proceed to crawl the entire website.

Where do I need to put the robots.txt file?

The robots.txt file has to be at a website’s top-level directory. You can access any website’s robots.txt by adding /robots.txt to the root domain URL, such as https://www.example.com/robots.txt.  It is also very important to make sure the file is correctly named “robots.txt”. “Robots.txt or ROBOTS.txt” are incorrect file names, since the file is case sensitive.

What are the basic instructions that robots.txt gives?

Allowing all web crawlers to access all content:

User-agent:*
Disallow:

Blocking all web crawlers from all content

User-agent:*
Disallow: /

It is possible to block only specific crawlers that obey the rules of “Robots Exclusion Protocol”. In general, there is a consensus among all the big, established search engines, disallowing access to a specific folder or the entire website.

User-agent: Googlebot
Disallow: https://www.example.com/blocked-page.html

User-agent: Bingbot
Disallow: /wp-admin/
Allow: /wp-content/uploads

In the first instance, robots.txt is not allowing the Googlebot to visit a specific page. These orders are specific and mean that Googlebot can crawl any other page, and that all other bots can crawl and index any page on that website they can reach.

In the second instance, Bingbot is not allowed to index the administrator folder on a WordPress installation. but it is allowed to crawl and index all the uploaded content (images) from the uploads folder in the installation folder of the website.

What are the advantages of using robots.txt?

Each website has a specific “crawl budget”, the number of web pages a search engine bot will crawl on that website. When blocking parts of a website from crawling, one can free up the budget for crawling the rest of the website, in case it has a very large number of pages. It is also good to prevent bots from crawling parts of the website that still need to be cleaned up or otherwise fixed before they can be presented to the public.

If we disallow a page, will it disappear from search results?

No. If a bot is not allowed to crawl a specific page, it will not do so. However, if the search engine finds links to the specific blocked URL on a third-party site, it will crawl the page. It means the page might show up in search results even if disallowed for crawling in the robots.txt file.

If you want to block a specific page from showing in search results, you need to use the “noindex” tag. On the other hand, to be able to find the page and not index it, that page must not be blocked by robots.txt.

[ Working with you, not for you ]

creating custom systems, 
built around your business

Featured here are a few of the websites we have designed and built, or built from customer-provided designs. We take great pride in giving each client the best possible solution for their specific needs.

Every business is different – that’s why we focus on creating custom solutions. Tell us your goals, and we’ll map out the systems, tools, and strategy to get you there.

[ Let’s work together ]

plan it.
with purpleplanet

If you’re considering a new website or digital platform, the first step is a simple one.

Start by sharing a few details about your business. We’ll then arrange a discovery call to understand your goals, constraints, and what success looks like for you.

From there, we’ll outline a clear, tailored plan, shaped around how your business operates, so you can move forward with confidence and clarity.

3

fill out the
contact form

We will get in touch within 24 working hours to schedule a call

2

hop on a
discovery call

We’ll assess where your business stands and where you want to go

1

get our bespoke
recommendations

We’ll outline the most effective path to achieve your goals

Lift Off!

let’s launch your project

Our bespoke websites typically start from around €4,000

    Name*
    Email*
    Website URL
    Project Goals & Aims

    By submitting this form, I confirm that the information provided is accurate. I consent to be contacted by purpleplanet and understand that any information provided is for informational purposes only and does not constitute an offer or solicitation.