Robots.txt Generator
Build a robots.txt file visually. Add crawler rules, disallow paths from presets, and include your sitemap URL. Copy the ready-to-deploy output.
User-agent: * Disallow: /admin/ Disallow: /private/ Sitemap: https://example.com/sitemap.xml
The File That Tells Crawlers Where They're Welcome
Every website on the internet implicitly invites web crawlers to index it. The robots.txt file is how you manage that invitation — which parts of your site you want indexed, which parts should stay private from crawlers, and where your sitemap lives. It's one of the oldest and most universally respected conventions on the web, dating to 1994 and honoured by Googlebot, Bingbot, and hundreds of other well-behaved crawlers.
Getting robots.txt right matters more than most developers realise. Too permissive and you waste Google's crawl budget on pages that shouldn't be indexed (internal search results, session-based URLs, admin panels). Too restrictive and you accidentally block crawlers from pages — or assets like CSS and JavaScript — that need to be accessible for your content to rank well.
The Crawl Budget Problem
Google doesn't crawl every page of every site every day. It allocates a crawl budget based on your site's size, authority, and how fast it serves responses. For large sites, crawl budget is a real constraint — Googlebot may not discover new pages promptly if it's spending time crawling thousands of internal search result URLs with no unique content value.
Blocking URL patterns that generate duplicate or thin content (internal search pages, filter combinations in e-commerce, date-based archive URLs) frees crawl budget for your actual content pages. The returns from this are felt primarily on larger sites, but the discipline of keeping robots.txt clean is worth establishing early.
What robots.txt Cannot Do
Three common misconceptions. First: Disallow does not make a page private. robots.txt is publicly visible at /robots.txt — attackers specifically check it to map site structure. Second: Disallow does not remove a page from search results. Google may show the URL in search results with no snippet if it appears in links. Third: malicious crawlers and scrapers ignore robots.txt entirely — they're not following the protocol. robots.txt is a courtesy mechanism for well-behaved bots only.
For pages you want out of search results: add a noindex robots meta tag using the Meta Tag Generator. For pages that should be inaccessible: use server authentication or access controls, not robots.txt.
Paths That Commonly Need Disallow Rules
- /admin/ and /wp-admin/ — Admin interfaces have no public SEO value and exposing their existence in search results is a minor security signal. Block them.
- /api/ endpoints — REST API endpoints return JSON, not crawlable HTML content. Blocking them conserves crawl budget for content pages.
- /search? — Internal search result pages are almost always thin content (query-dependent, often no unique text). They dilute crawl budget and can cause duplicate content issues.
- /cart/, /checkout/, /account/ — Transaction and account pages have no SEO value and often contain session-specific content.
- URL parameter patterns: Filter and sort parameters in e-commerce (
?sort=price&color=red) generate thousands of near-duplicate URLs. Block the parameter patterns if your CMS doesn't handle canonical tags for them. After setting up robots.txt, add canonical tags via the Meta Tag Generator for belt-and-suspenders duplicate content prevention.
Deploying Your robots.txt
Generate the file here, copy the output, and save it as robots.txt at the exact root of your web server — accessible at https://yourdomain.com/robots.txt. Most hosting platforms (Vercel, Netlify, Apache, Nginx) serve static files from a public/ directory or the server root. After deploying, verify accessibility and submit the URL in Google Search Console's robots.txt tester to confirm Googlebot can read it correctly.
✓Verified by ToollyX Team · Last updated June 2026
Frequently Asked Questions
Disclaimer: robots.txt content is generated in your browser. Deploy the output to your web server root. robots.txt is not a security mechanism.