How to Check robots.txt & sitemap.xml Correctly
The robots.txt and sitemap.xml files are the foundation for Google to crawl and index your site correctly. A misconfiguration in either can make important pages disappear from search. This guide shows how to check both properly.
robots.txt — control crawling
The robots.txt file lives at https://domain.com/robots.txt and tells search engines which pages they may crawl.
Common robots.txt mistakes
- Blocking the whole site: a
Disallow: /line (often left over after deploying from staging). - Blocking important folders by mistake: e.g.
Disallow: /blog. - Forgetting the Sitemap declaration: add
Sitemap: https://domain.com/sitemap.xml.
Check it now: Free Robots.txt Checker parses every Allow/Disallow rule.
sitemap.xml — help index faster
A sitemap lists the URLs you want Google to index, especially important for large or new sites.
What to check in a sitemap
- URL count under 50,000 per file.
- A
<lastmod>tag so Google knows when content changed. - Correct structure (a sitemap index pointing to child sitemaps when there are many URLs).
A check routine after every deploy
- Open the Robots.txt Checker, confirm nothing is blocked by mistake.
- Open the Sitemap Checker, confirm the URL count and lastmod.
- Submit the sitemap in Google Search Console.
Conclusion
Make checking robots.txt and sitemap a habit after every major change. Two minutes of checking can save months of lost traffic from a config error.
Frequently asked questions
How are robots.txt and sitemap.xml different?
How do I know if robots.txt blocks by mistake?
How often should I check?
Nhận bản tóm tắt SEO checklist qua email
Đăng ký để nhận bản tóm tắt các bước tối ưu SEO quan trọng nhất từ bài viết này.
Nhập email để tải template audit SEO 1 trang, dùng ngay cho website của bạn.
Check your website for free
Run an SEO audit or check your traffic quality now — no signup required.