Technical SEO EssentialsLesson 2

Site Architecture & Crawlability

Core
6 min
intermediate

You'll learn:

  • Design a crawlable site structure
  • Use robots.txt and XML sitemaps effectively

Crawlability: Can Google Find Your Pages?

Before a page can rank, Google must discover it. Crawlers navigate your site through links. A confusing architecture means important pages might never get indexed.

Flat vs. Deep Site Architecture

Aim for a "flat" architecture where important pages are just a few clicks from the homepage:

  • Good (flat): Home → Category → Product (3 clicks)
  • Bad (deep): Home → 2024 → Blog → Category → Post → Page (6+ clicks)
💡

Google has a "crawl budget"—the number of pages it will crawl on your site per visit. Complex architectures waste this budget on unimportant pages.

Internal Linking Best Practices

  • Use descriptive anchor text: "SEO guide" not "click here"
  • Link to related content: keep users on your site longer
  • Add HTML sitemaps: linked from footer for users and crawlers
  • Use breadcrumbs: show users (and Google) your site hierarchy
  • Fix broken links: regularly check for 404 errors

Robots.txt: Controlling Crawler Access

The robots.txt file tells crawlers which parts of your site to avoid. It's useful for preventing crawlers from accessing admin areas, staging sites, or resource-intensive pages.

# robots.txt
User-agent: *
Allow: /

Disallow: /admin/
Disallow: /api/
Disallow: /private/
text
⚠️

Warning: robots.txt is public. Don't use it to hide sensitive information—use authentication instead. Also, Google may still index pages disallowed in robots.txt if other sites link to them.

XML Sitemaps: Your Site's Table of Contents

An XML sitemap lists all important pages on your site. Submit it to Google Search Console to help Google discover and prioritize your content.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page-1</loc>
    <lastmod>2026-01-24</lastmod>
    <priority>1.0</priority>
  </url>
</urlset>
xml
Key Takeaway:
Make it easy for Google to crawl your site. A clear site architecture, internal links, and a proper sitemap ensure your best content gets found and indexed.