1. What robots.txt does

The robots.txt file sits at the root of your domain and tells search-engine crawlers which parts of your site they may or may not request. Its main job is managing crawl traffic — keeping bots out of admin areas, search-result pages, or other low-value sections so they spend their time on pages that matter.

One crucial nuance, straight from Google's robots.txt introduction: it is not a tool for hiding a page from Google. A URL blocked in robots.txt can still appear in results if other pages link to it — Google just won't see its content. To truly keep a page out of search, use a noindex tag instead.

2. robots.txt: the costly mistakes

Because one line can affect your whole site, robots.txt is where small errors do big damage:

  • Disallowing everything. A stray Disallow: / (often left over from a staging site) blocks the entire site from crawling. Always check this after a launch or migration.
  • Blocking CSS or JS. If you block the resources Google needs to render the page, it may misjudge your content and mobile-friendliness.
  • Using it to “hide” pages. As above — that's what noindex is for, not robots.txt.
  • Forgetting the sitemap line. Adding Sitemap: https://yoursite.com/sitemap.xml helps crawlers find your sitemap.

3. What an XML sitemap does

If robots.txt is the “keep out” sign, an XML sitemap is the map you hand Google of everything you do want found. It's a file listing your important URLs, helping search engines discover pages they might otherwise miss — especially on large sites, new sites with few backlinks, or pages buried deep in your structure. Google's sitemaps overview explains when a sitemap helps most and what to include.

A sitemap doesn't guarantee indexing — it's a recommendation, not a command — but it makes discovery faster and more reliable, and it gives you a place to see crawl errors in Search Console.

4. Building and submitting your sitemap

Keep the sitemap clean: include only canonical, indexable URLs (no redirects, no noindex pages, no broken links), and keep it current as you add or remove pages. Once it's live:

  • Confirm it actually returns valid XML when you open it in a browser — a server misconfiguration that serves your homepage instead is a classic reason Google reports “couldn't fetch.”
  • Reference it in robots.txt with a Sitemap: line.
  • Submit it in Google Search Console under Sitemaps, and check it reports Success with the expected URL count.

You can generate a correct robots.txt and sitemap setup in seconds with the Robots & Sitemap Tool.

Generate both with the Robots & Sitemap Tool

Create a correct robots.txt and sitemap hints in seconds, so search engines crawl what matters and skip what doesn't. Free, no signup.

Try the Robots & Sitemap Tool →

5. How they work together

Think of them as a pair: robots.txt manages where crawlers can go, and the sitemap tells them what's worth finding. Used well, they guide Google efficiently — spending crawl effort on your valuable pages and pointing it straight at new ones. Used carelessly, they're one of the most common reasons pages never get indexed.

After any change to either file, confirm the result: check key pages with the indexing guide, and run a broader site audit to catch anything still being blocked.

Calculate your conversational AI ROI — free, no signup