nutilz
๐Ÿค–

Robots.txt Generator

Create a valid robots.txt file for your website

Quick Presets

User-Agent Rules

Note: Googlebot ignores Crawl-delay. Use Google Search Console to adjust Googlebot's crawl rate.

Generated robots.txt

User-agent: *
Disallow:

๐Ÿ”’ No upload โ€” runs entirely in your browser. Your data never leaves your device.

What is a robots.txt File?

A robots.txt file is a plain text file you place in the root of your website that communicates crawl instructions to web robots โ€” primarily search engine crawlers like Googlebot and Bingbot. It follows the Robots Exclusion Protocol (REP), a standard that responsible crawlers honor by default.

Every time a crawler visits your site, it first checks https://yourdomain.com/robots.txt. Within milliseconds, it reads your directives and decides which parts of your site to fetch. This makes robots.txt one of the first and most fundamental SEO configurations you should set up on any website.

Common uses include blocking admin panels (/admin), WordPress login pages (/wp-login.php), internal search results (/?s=), staging areas, and private APIs from appearing in search engine indexes. It also lets you point crawlers directly to your sitemap so they discover new pages faster.

How to Use This Robots.txt Generator

  1. 1.Start with a preset โ€” "Default (Allow All)" works for most new sites. Choose "Block AI Crawlers" if you want to prevent AI training bots from harvesting your content.
  2. 2.Add or edit User-Agent rules. Use "*" (asterisk) to target all crawlers, or pick a specific bot from the dropdown. Each rule group applies to one user-agent.
  3. 3.Enter paths in the Disallow field โ€” one per line. These are paths you want crawlers to skip. Leave Disallow empty to allow full crawling for that agent.
  4. 4.Use the Allow field for exceptions: if you disallow a parent path but want a specific subdirectory to be crawlable, add it here. Allow rules take priority over Disallow for the same path.
  5. 5.Add your sitemap URL so crawlers can discover all your pages directly. This is optional but strongly recommended.
  6. 6.Click "Copy" or "Download robots.txt", then upload the file to your website's root directory (same level as your index.html or homepage).

Verify the file is live by visiting https://yourdomain.com/robots.txt in your browser. Then submit your sitemap in Google Search Console to let Googlebot start indexing right away.

Understanding robots.txt Syntax: Key Directives Explained

Robots.txt has a simple but precise syntax. Each directive sits on its own line, and a blank line separates one user-agent block from another.

  • User-agent: Specifies which crawler the following rules apply to. Use * (wildcard) to target all crawlers, or a specific name like Googlebot for Google only. If you have both specific and wildcard rules, crawlers use the most specific block that matches their user-agent.
  • Disallow: Tells crawlers not to access a URL path. Disallow: /admin blocks everything under /admin including /admin/users, /admin/settings, etc. Disallow: / blocks the entire site. An empty Disallow: (no path) means nothing is blocked.
  • Allow: Explicitly permits a path even if a broader Disallow rule would block it. Only Google, Bing and a handful of crawlers support Allow. Place Allow rules before Disallow rules in the same block for Googlebot. Example: Allow: /products/featured with Disallow: /products.
  • Crawl-delay: Asks a crawler to wait N seconds between requests. Helps prevent server overload from aggressive crawling. Googlebot ignores this โ€” use Google Search Console to adjust Googlebot's rate. Bingbot and most other crawlers do respect it.
  • Sitemap: Points crawlers to your XML sitemap so they can discover all your pages. Multiple Sitemap lines are valid. This is placed at the end of the file, outside any user-agent block.

Robots.txt paths are case-sensitive on Linux servers and case-insensitive on Windows servers. Always use the exact casing of your actual URLs. Paths can use a wildcard asterisk (*) to match any sequence of characters, and a dollar sign ($) to anchor to the end of a URL.

Why Block AI Crawlers in 2025 โ€” and Which Ones to Target

Since 2023, a new category of web crawlers has emerged: AI training bots that scrape content to train large language models. Unlike traditional search engine crawlers, these bots do not bring referral traffic back to your site โ€” they consume your content and return nothing in exchange.

The most commonly blocked AI crawlers are: GPTBot (used by OpenAI to train ChatGPT and GPT-4), CCBot (Common Crawl, used by many AI companies), Claude-Web and Anthropic-AI (Anthropic), PerplexityBot (Perplexity AI search), OAI-SearchBot (OpenAI's search product), and Applebot-Extended(Apple's AI features).

Use the "Block AI Crawlers" preset in this generator to add all of these with a single click. The preset places each AI bot in its own block with Disallow: /, then leaves a final User-agent: * block with an empty Disallow so that Google, Bing, and other traditional search engines can still crawl freely.

Keep in mind that robots.txt is voluntary โ€” a well-behaved crawler respects it, but a poorly-coded or malicious scraper will not. For stronger protection against aggressive scrapers, complement robots.txt with Cloudflare Bot Fight Mode, firewall rules blocking specific user-agent strings, or rate limiting at the server level.

Common robots.txt Mistakes That Hurt SEO

  • Accidentally blocking your entire site. Disallow: / for User-agent: * blocks all crawlers from everything. This is one of the most catastrophic SEO mistakes โ€” pages will disappear from Google within days. Always preview your output before deploying.
  • Blocking CSS, JS, or image files. Google needs to render your pages to understand them. Blocking stylesheet or JavaScript paths (e.g., Disallow: /assets) prevents Googlebot from fully rendering your pages, which can hurt rankings.
  • Relying on robots.txt for security. Robots.txt is a public file. Any human or bot can read it. Listing /admin or /api in Disallow does not hide those paths โ€” it publicly advertises them. Use authentication and access controls for actual security.
  • Wrong file location or name. The file must be named exactly robots.txt (lowercase) and placed at the root of your domain, not in a subdirectory. example.com/robots/robots.txt will be ignored; only example.com/robots.txt is read.
  • No sitemap directive. Omitting the Sitemap line means crawlers have to discover your pages by following links. Adding a Sitemap line ensures every URL you want indexed gets found faster.

Robots.txt vs. Meta Robots Tag: When to Use Which

Robots.txt and the HTML meta robots tag both control how search engines interact with your content, but they work at different levels and have different strengths.

Robots.txtoperates at the URL path level โ€” it tells crawlers whether to fetch a page at all. Use it to block entire directories (/admin/, /staging/) or specific file types that don't need crawling. Because it prevents the page from being fetched, blocked pages cannot be indexed even if they're linked from other sites. However, if a page is blocked in robots.txt but linked from elsewhere, Google may still list its URL in search results โ€” just without a snippet.

The meta robots tag (<meta name="robots" content="noindex">) is placed inside an individual page's HTML. It tells Googlebot: "you can fetch this page, but don't add it to your index." Use this for pages like thank-you pages, duplicate content, or low-value pages where you want the crawler to see the page (to follow links or respect canonical tags) but not index it.

A common mistake: blocking a page in robots.txt AND adding noindex in the HTML. If Googlebot is blocked from fetching the page, it can never see the noindex tag, so the noindex instruction is useless. Use robots.txt to block entire sections that have no SEO value at all. Use meta noindex for pages you want crawled but not indexed.

Frequently Asked Questions

What is a robots.txt file?+
A robots.txt file is a plain text file placed in the root directory of your website (e.g., https://yoursite.com/robots.txt) that tells web crawlers which pages or sections of your site they are allowed or not allowed to access. It follows the Robots Exclusion Protocol (REP), a standard web crawlers honor. Robots.txt is used to prevent search engines from indexing low-value pages like admin panels, duplicate content, or staging environments. It also lets you control crawl budget on large sites by steering crawlers toward your most important pages.
How do I add a robots.txt file to my website?+
To add a robots.txt file to your website: (1) Generate your file using this tool. (2) Save it as a plain text file named exactly robots.txt (lowercase, with the .txt extension). (3) Upload it to the root directory of your domain โ€” it must be accessible at https://yourdomain.com/robots.txt. In WordPress, you can use an SEO plugin like Yoast or Rank Math to manage robots.txt without FTP access. After uploading, verify it is working by visiting your robots.txt URL directly in the browser.
What is the difference between Allow and Disallow in robots.txt?+
Disallow tells a crawler not to access a specific path or set of paths. For example, Disallow: /admin tells crawlers to skip everything under /admin. Allow explicitly permits a path that would otherwise be blocked by a broader Disallow rule. For example, if you set Disallow: /products but want /products/featured to be crawlable, you would add Allow: /products/featured before the Disallow rule. Allow takes precedence when both an Allow and Disallow rule match the same path โ€” the more specific rule wins in Googlebot's implementation. An empty Disallow (Disallow:) means nothing is blocked โ€” this is equivalent to allowing full crawling.
How do I block AI crawlers like GPTBot from my website?+
To block AI training crawlers, add a dedicated user-agent block before your general crawl rules. Common AI crawlers include GPTBot (OpenAI/ChatGPT), CCBot (Common Crawl), Claude-Web (Anthropic), Omgilibot, PerplexityBot, and Applebot-Extended (Apple AI). To block them all, add a rule for each with Disallow: /. Use this generator's "Block AI Crawlers" preset to add all known AI crawlers at once. Note that robots.txt blocking is voluntary โ€” malicious scrapers may ignore it. For stronger protection, use your web server's firewall or Cloudflare WAF rules.
What happens if I do not have a robots.txt file?+
If you do not have a robots.txt file, most well-behaved crawlers โ€” including Googlebot โ€” will crawl your entire site freely. This is usually fine for small websites or sites where all pages are intended to be indexed. However, for larger sites, the absence of a robots.txt file means you have no control over crawl budget and may waste crawler resources on pages that add little SEO value, like pagination pages, search result pages, or admin interfaces. Adding even a minimal robots.txt with just a Sitemap directive is a best practice.
Does robots.txt protect my website from hackers?+
No. Robots.txt is not a security mechanism. It is a public file that any person or bot can read, and it is voluntary โ€” only well-behaved crawlers follow it. Listing paths in Disallow actually tells potential attackers exactly where your sensitive directories are. For real security, protect sensitive pages with authentication, server-side access controls, and firewall rules. Use robots.txt solely to guide search engines and manage crawl budget, never to hide sensitive content.
What is Crawl-delay and should I use it?+
Crawl-delay instructs a crawler to wait a specified number of seconds between successive requests to your server. For example, Crawl-delay: 10 tells the crawler to wait 10 seconds between page fetches. This is useful for smaller servers that may get overloaded by aggressive crawling. Important note: Googlebot does not honor Crawl-delay. To control Googlebot's crawl rate, use the crawl rate settings in Google Search Console instead. Bingbot does honor Crawl-delay. Most other crawlers respect it as well. Use it only if your server is struggling to handle crawler traffic โ€” unnecessarily high delays can slow down how quickly Google indexes new content.