Robots.txt is one of those technical SEO elements that is easy to overlook — it is a small, plain text file that lives quietly at the root of your website — but it has significant control over what search engines can and cannot access on your site. Configured correctly, it helps search engines use their crawl budget efficiently. Configured incorrectly, it can block your most important pages from being indexed, collapse your organic traffic, and take weeks to identify because the damage is not immediately visible.
This guide explains what robots.txt is, exactly how it works, the mistakes that most frequently cause SEO problems, and how to configure it correctly for your site.
What Robots.txt Is and What It Does?
Robots.txt is a text file stored at the root domain of your website — at yoursite.com/robots.txt — that provides instructions to web crawlers about which parts of your site they are allowed to access and which they should not. It follows a standard called the Robots Exclusion Protocol, which all major search engine crawlers respect.
When a search engine crawler like Googlebot visits your site, it checks robots.txt before crawling anything else. If the file tells it not to crawl certain directories or pages, it will generally respect those instructions and skip those areas. Understanding how Googlebot accesses your pages is the foundation for understanding why robots.txt matters — it is one of the very first things Googlebot reads before deciding what to do next.
The robots.txt file uses simple directives. “User-agent” specifies which crawlers the following rules apply to — a wildcard asterisk (*) means all crawlers. “Disallow” specifies paths that the crawler should not access. “Allow” specifies paths that should be accessible even within a disallowed directory. “Sitemap” can be included to point crawlers to your XML sitemap.
What Robots.txt Does Not Do?
Understanding the limitations of robots.txt is as important as understanding what it does. A common misconception is that blocking a page in robots.txt prevents it from appearing in search results. This is not reliably true.
If a blocked page has external links pointing to it, Google may still index the URL — it just cannot crawl the content. The page can appear in search results as a URL without a title or description, which is confusing for users and reflects poorly on site quality. Blocking a page in robots.txt prevents Googlebot from reading its content, but does not guarantee the URL stays out of the index. For URLs you want fully excluded from Google’s index, the correct tool is a noindex meta tag on the page itself, not robots.txt.
Robots.txt is also not a security measure. It is a public file — anyone can read it by visiting yoursite.com/robots.txt. Pages blocked in robots.txt are not hidden from users or from anyone who knows the URL. For genuinely sensitive content, server-level authentication or access controls are required.
Robots.txt and Crawl Budget
For large websites — eCommerce stores with thousands of product pages, news sites with tens of thousands of articles, corporate sites with extensive documentation libraries — crawl budget management becomes a meaningful SEO consideration. Google allocates a finite amount of crawling resources to each domain per crawl cycle. If that budget is spent crawling low-value pages — filtered URL variants, admin pages, development staging areas, session-based URLs — the high-value pages that need to be crawled and indexed regularly may be reached less frequently.
Robots.txt is one tool for directing crawl budget toward high-value content by preventing crawlers from spending time on low-value areas. Common categories of pages worth disallowing include admin and login pages, internal search result pages, cart and checkout pages that should not be indexed, development or staging directories that should never be live on a production domain, and URL parameters that create duplicate versions of existing pages.
Common Robots.txt Mistakes That Hurt SEO
Accidentally Blocking the Entire Site
The most catastrophic robots.txt mistake — and one that happens more often than it should — is a rule that disallows all crawlers from accessing all pages of the site. This typically happens during a website build or redesign, when a developer adds a blanket disallow rule to prevent the site from being indexed while it is under construction, and then forgets to remove it before launch.
The disallow all rule looks like this in the file: User-agent: * followed by Disallow: /. When the site goes live with this rule in place, Google cannot crawl any of it. Organic traffic collapses, sometimes within days as Google discovers the file during its next crawl cycle. If you have ever experienced a sudden, complete loss of organic traffic with no obvious cause, check robots.txt immediately.
Blocking CSS and JavaScript Files
Google needs to render your pages — to see them as a user sees them — to fully evaluate their content. Rendering requires loading the CSS and JavaScript that controls how the page displays. If your robots.txt blocks Googlebot from accessing style sheets or scripts, Google may see a broken, unstyled version of your pages and fail to understand their full content.
Modern websites should ensure their robots.txt does not block the CSS and JavaScript directories. You can verify how Google sees your pages using the URL Inspection tool in Google Search Console, which shows you a screenshot of how Googlebot renders each page.
Blocking Pages That Have External Links
When pages with inbound external links are blocked in robots.txt, the link equity those links carry cannot pass to the rest of your site because Googlebot cannot crawl the blocked page to follow its internal links. This is link equity that disappears entirely rather than flowing through to your important pages. If you need to deindex a page that has external links, use a noindex tag rather than robots.txt — this allows Google to crawl the page and follow its links while not indexing the page itself.
Inconsistency With Sitemap
Your XML sitemap and your robots.txt file need to be consistent. Including a URL in your sitemap signals to Google that you want it indexed. Simultaneously blocking that URL in robots.txt sends a contradictory signal. The sitemap should only include URLs that robots.txt does not block — any page you have disallowed in robots.txt should not appear in your sitemap.
How to Check and Validate Your Robots.txt
Access your current robots.txt file by visiting yoursite.com/robots.txt directly in a browser. Google Search Console includes a robots.txt report under Settings that shows your current file and highlights any syntax errors. Google’s URL Inspection tool can be used to test whether a specific URL is accessible to Googlebot, which helps verify that your robots.txt rules are working as intended.
After any change to robots.txt, submit a crawl request in Google Search Console to prompt Googlebot to re-read the file promptly rather than waiting for its next scheduled crawl.
Robots.txt as Part of Technical SEO
Robots.txt configuration is one component of the broader technical foundation that determines how efficiently Google can crawl and index your site. A well-configured robots.txt, combined with a clean XML sitemap, correct canonical tags, and proper noindex implementation where needed, creates a coherent set of instructions that allows Google to spend its crawl resources on the pages that matter most.
A technical SEO audit reviews your robots.txt alongside all other crawlability and indexation signals, identifying contradictions between your crawl instructions, sitemap, and noindex tags that may be suppressing organic performance without any visible surface-level cause. These inconsistencies are common and are among the most fixable technical problems when found — which makes robots.txt review one of the highest-value, lowest-effort checks in any technical audit.

