robots.txt & Sitemap Generator
Generate a clean robots.txt and sitemap.xml — ready to upload to your server.
What is robots.txt?
A plain text file that tells search engine crawlers which pages they can and cannot visit.
It goes in the root of your domain at yoursite.com/robots.txt.
Googlebot checks it before crawling any page on your site.
It's a request, not a hard block — malicious bots may ignore it.
What is sitemap.xml?
An XML file listing all the pages you want search engines to index.
Upload it to yoursite.com/sitemap.xml, then submit it in
Google Search Console under Sitemaps.
It helps Google discover new pages faster — especially important for large or new sites.
Where to upload
Both files go in the root directory of your site — the same folder as your homepage.
In cPanel: public_html/robots.txt and public_html/sitemap.xml.
In WinSCP: upload to the root level, not inside any subfolder.
WordPress: the root is where wp-config.php lives.
What to disallow
Block pages you don't want indexed: /wp-admin/, /wp-login.php,
/cart/, /checkout/, /thank-you/,
/search/ (prevents duplicate content from search result pages),
and any /staging/ or /test/ directories.
Priority & changefreq
Priority (0.0–1.0) signals which pages are most important. Set your homepage to 1.0, main category pages to 0.8, blog posts to 0.6–0.7. Changefreq hints how often content changes. Google doesn't always follow these hints but they help communicate your update schedule.
Blocking AI bots
AI training bots like GPTBot (OpenAI) and Claude-Web (Anthropic) crawl your content to train AI models. You can block them without affecting Google or Bing rankings. Google-Extended specifically feeds Google's Gemini — blocking it doesn't affect search rankings.
sitemap.xml. Google will show you how many URLs were submitted vs indexed — a gap there tells you about crawl budget or indexing issues.