Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a plain text file placed in the root directory of your website (e.g., https://yoursite.com/robots.txt) that tells web crawlers which pages or sections of your site they are allowed or not allowed to access. It follows the Robots Exclusion Protocol (REP), a standard web crawlers honor. Robots.txt is used to prevent search engines from indexing low-value pages like admin panels, duplicate content, or staging environments. It also lets you control crawl budget on large sites by steering crawlers toward your most important pages.

Question 2

How do I add a robots.txt file to my website?

Accepted Answer

To add a robots.txt file to your website: (1) Generate your file using this tool. (2) Save it as a plain text file named exactly robots.txt (lowercase, with the .txt extension). (3) Upload it to the root directory of your domain — it must be accessible at https://yourdomain.com/robots.txt. In WordPress, you can use an SEO plugin like Yoast or Rank Math to manage robots.txt without FTP access. After uploading, verify it is working by visiting your robots.txt URL directly in the browser.

Question 3

What is the difference between Allow and Disallow in robots.txt?

Accepted Answer

Disallow tells a crawler not to access a specific path or set of paths. For example, Disallow: /admin tells crawlers to skip everything under /admin. Allow explicitly permits a path that would otherwise be blocked by a broader Disallow rule. For example, if you set Disallow: /products but want /products/featured to be crawlable, you would add Allow: /products/featured before the Disallow rule. Allow takes precedence when both an Allow and Disallow rule match the same path — the more specific rule wins in Googlebot's implementation. An empty Disallow (Disallow:) means nothing is blocked — this is equivalent to allowing full crawling.

Question 4

How do I block AI crawlers like GPTBot from my website?

Accepted Answer

To block AI training crawlers, add a dedicated user-agent block before your general crawl rules. Common AI crawlers include GPTBot (OpenAI/ChatGPT), CCBot (Common Crawl), Claude-Web (Anthropic), Omgilibot, PerplexityBot, and Applebot-Extended (Apple AI). To block them all, add a rule for each with Disallow: /. Use this generator's "Block AI Crawlers" preset to add all known AI crawlers at once. Note that robots.txt blocking is voluntary — malicious scrapers may ignore it. For stronger protection, use your web server's firewall or Cloudflare WAF rules.

Question 5

What happens if I do not have a robots.txt file?

Accepted Answer

If you do not have a robots.txt file, most well-behaved crawlers — including Googlebot — will crawl your entire site freely. This is usually fine for small websites or sites where all pages are intended to be indexed. However, for larger sites, the absence of a robots.txt file means you have no control over crawl budget and may waste crawler resources on pages that add little SEO value, like pagination pages, search result pages, or admin interfaces. Adding even a minimal robots.txt with just a Sitemap directive is a best practice.

Question 6

Does robots.txt protect my website from hackers?

Accepted Answer

No. Robots.txt is not a security mechanism. It is a public file that any person or bot can read, and it is voluntary — only well-behaved crawlers follow it. Listing paths in Disallow actually tells potential attackers exactly where your sensitive directories are. For real security, protect sensitive pages with authentication, server-side access controls, and firewall rules. Use robots.txt solely to guide search engines and manage crawl budget, never to hide sensitive content.

Question 7

What is Crawl-delay and should I use it?

Accepted Answer

Crawl-delay instructs a crawler to wait a specified number of seconds between successive requests to your server. For example, Crawl-delay: 10 tells the crawler to wait 10 seconds between page fetches. This is useful for smaller servers that may get overloaded by aggressive crawling. Important note: Googlebot does not honor Crawl-delay. To control Googlebot's crawl rate, use the crawl rate settings in Google Search Console instead. Bingbot does honor Crawl-delay. Most other crawlers respect it as well. Use it only if your server is struggling to handle crawler traffic — unnecessarily high delays can slow down how quickly Google indexes new content.

Robots.txt Generator

Related Tools

What is a robots.txt File?

How to Use This Robots.txt Generator

Understanding robots.txt Syntax: Key Directives Explained

Why Block AI Crawlers in 2025 — and Which Ones to Target

Common robots.txt Mistakes That Hurt SEO

Robots.txt vs. Meta Robots Tag: When to Use Which

Frequently Asked Questions