Robots.txt Generator
Create a valid robots.txt file for your website
Quick Presets
User-Agent Rules
Note: Googlebot ignores Crawl-delay. Use Google Search Console to adjust Googlebot's crawl rate.
Generated robots.txt
User-agent: * Disallow:
๐ No upload โ runs entirely in your browser. Your data never leaves your device.
Related Tools
Domain Expiry Checker
Check when any domain expires โ expiry date, days remaining, registrar and nameservers.
IP Address Lookup
Find your public IP address, ISP, city, country, timezone and ASN.
Cron Expression Parser
Translate cron expressions to plain English with next-run preview.
Hash Generator
Generate SHA-1, SHA-256 and SHA-512 cryptographic hashes.
What is a robots.txt File?
A robots.txt file is a plain text file you place in the root of your website that communicates crawl instructions to web robots โ primarily search engine crawlers like Googlebot and Bingbot. It follows the Robots Exclusion Protocol (REP), a standard that responsible crawlers honor by default.
Every time a crawler visits your site, it first checks https://yourdomain.com/robots.txt. Within milliseconds, it reads your directives and decides which parts of your site to fetch. This makes robots.txt one of the first and most fundamental SEO configurations you should set up on any website.
Common uses include blocking admin panels (/admin), WordPress login pages (/wp-login.php), internal search results (/?s=), staging areas, and private APIs from appearing in search engine indexes. It also lets you point crawlers directly to your sitemap so they discover new pages faster.
How to Use This Robots.txt Generator
- 1.Start with a preset โ "Default (Allow All)" works for most new sites. Choose "Block AI Crawlers" if you want to prevent AI training bots from harvesting your content.
- 2.Add or edit User-Agent rules. Use "*" (asterisk) to target all crawlers, or pick a specific bot from the dropdown. Each rule group applies to one user-agent.
- 3.Enter paths in the Disallow field โ one per line. These are paths you want crawlers to skip. Leave Disallow empty to allow full crawling for that agent.
- 4.Use the Allow field for exceptions: if you disallow a parent path but want a specific subdirectory to be crawlable, add it here. Allow rules take priority over Disallow for the same path.
- 5.Add your sitemap URL so crawlers can discover all your pages directly. This is optional but strongly recommended.
- 6.Click "Copy" or "Download robots.txt", then upload the file to your website's root directory (same level as your index.html or homepage).
Verify the file is live by visiting https://yourdomain.com/robots.txt in your browser. Then submit your sitemap in Google Search Console to let Googlebot start indexing right away.
Understanding robots.txt Syntax: Key Directives Explained
Robots.txt has a simple but precise syntax. Each directive sits on its own line, and a blank line separates one user-agent block from another.
- User-agent: Specifies which crawler the following rules apply to. Use
*(wildcard) to target all crawlers, or a specific name likeGooglebotfor Google only. If you have both specific and wildcard rules, crawlers use the most specific block that matches their user-agent. - Disallow: Tells crawlers not to access a URL path.
Disallow: /adminblocks everything under /admin including /admin/users, /admin/settings, etc.Disallow: /blocks the entire site. An emptyDisallow:(no path) means nothing is blocked. - Allow: Explicitly permits a path even if a broader Disallow rule would block it. Only Google, Bing and a handful of crawlers support Allow. Place Allow rules before Disallow rules in the same block for Googlebot. Example:
Allow: /products/featuredwithDisallow: /products. - Crawl-delay: Asks a crawler to wait N seconds between requests. Helps prevent server overload from aggressive crawling. Googlebot ignores this โ use Google Search Console to adjust Googlebot's rate. Bingbot and most other crawlers do respect it.
- Sitemap: Points crawlers to your XML sitemap so they can discover all your pages. Multiple Sitemap lines are valid. This is placed at the end of the file, outside any user-agent block.
Robots.txt paths are case-sensitive on Linux servers and case-insensitive on Windows servers. Always use the exact casing of your actual URLs. Paths can use a wildcard asterisk (*) to match any sequence of characters, and a dollar sign ($) to anchor to the end of a URL.
Why Block AI Crawlers in 2025 โ and Which Ones to Target
Since 2023, a new category of web crawlers has emerged: AI training bots that scrape content to train large language models. Unlike traditional search engine crawlers, these bots do not bring referral traffic back to your site โ they consume your content and return nothing in exchange.
The most commonly blocked AI crawlers are: GPTBot (used by OpenAI to train ChatGPT and GPT-4), CCBot (Common Crawl, used by many AI companies), Claude-Web and Anthropic-AI (Anthropic), PerplexityBot (Perplexity AI search), OAI-SearchBot (OpenAI's search product), and Applebot-Extended(Apple's AI features).
Use the "Block AI Crawlers" preset in this generator to add all of these with a single click. The preset places each AI bot in its own block with Disallow: /, then leaves a final User-agent: * block with an empty Disallow so that Google, Bing, and other traditional search engines can still crawl freely.
Keep in mind that robots.txt is voluntary โ a well-behaved crawler respects it, but a poorly-coded or malicious scraper will not. For stronger protection against aggressive scrapers, complement robots.txt with Cloudflare Bot Fight Mode, firewall rules blocking specific user-agent strings, or rate limiting at the server level.
Common robots.txt Mistakes That Hurt SEO
- Accidentally blocking your entire site.
Disallow: /forUser-agent: *blocks all crawlers from everything. This is one of the most catastrophic SEO mistakes โ pages will disappear from Google within days. Always preview your output before deploying. - Blocking CSS, JS, or image files. Google needs to render your pages to understand them. Blocking stylesheet or JavaScript paths (e.g.,
Disallow: /assets) prevents Googlebot from fully rendering your pages, which can hurt rankings. - Relying on robots.txt for security. Robots.txt is a public file. Any human or bot can read it. Listing /admin or /api in Disallow does not hide those paths โ it publicly advertises them. Use authentication and access controls for actual security.
- Wrong file location or name. The file must be named exactly
robots.txt(lowercase) and placed at the root of your domain, not in a subdirectory.example.com/robots/robots.txtwill be ignored; onlyexample.com/robots.txtis read. - No sitemap directive. Omitting the Sitemap line means crawlers have to discover your pages by following links. Adding a Sitemap line ensures every URL you want indexed gets found faster.
Robots.txt vs. Meta Robots Tag: When to Use Which
Robots.txt and the HTML meta robots tag both control how search engines interact with your content, but they work at different levels and have different strengths.
Robots.txtoperates at the URL path level โ it tells crawlers whether to fetch a page at all. Use it to block entire directories (/admin/, /staging/) or specific file types that don't need crawling. Because it prevents the page from being fetched, blocked pages cannot be indexed even if they're linked from other sites. However, if a page is blocked in robots.txt but linked from elsewhere, Google may still list its URL in search results โ just without a snippet.
The meta robots tag (<meta name="robots" content="noindex">) is placed inside an individual page's HTML. It tells Googlebot: "you can fetch this page, but don't add it to your index." Use this for pages like thank-you pages, duplicate content, or low-value pages where you want the crawler to see the page (to follow links or respect canonical tags) but not index it.
A common mistake: blocking a page in robots.txt AND adding noindex in the HTML. If Googlebot is blocked from fetching the page, it can never see the noindex tag, so the noindex instruction is useless. Use robots.txt to block entire sections that have no SEO value at all. Use meta noindex for pages you want crawled but not indexed.