Written by

in

If you’ve been paying attention to how AI tools interact with the web, you’ve probably noticed a growing problem: language models are terrible at reading websites. They choke on navigation menus, sidebars, cookie banners, and JavaScript-rendered content. The information they need is buried under layers of HTML that was never designed for them.

Enter llms.txt, a simple, Markdown-formatted file that sits at your website’s root and tells AI exactly what your site is about and where to find the good stuff.

The Problem llms.txt Solves

When an AI tool like ChatGPT, Claude, or Cursor needs information from your website, it faces two fundamental challenges:

  1. Context windows have limits. Even the largest models can’t ingest an entire website at once. They need to be pointed at the right content.
  2. HTML is noisy. A typical webpage is 20% useful content and 80% markup, scripts, ads, and navigation. Converting that mess into something an LLM can actually work with is unreliable at best.

The traditional web standards don’t help here. robots.txt tells crawlers where not to go. sitemap.xml dumps every URL without any sense of priority. Neither one tells an AI model: “Here’s what actually matters on this site, in a format you can read.”

That’s the gap llms.txt fills. It’s a curated guide to your site, written in Markdown -the format language models handle natively -that lets you control what AI tools see and how they understand your content.

Who Created It

Jeremy Howard, co-founder of Answer.AI and creator of fast.ai, published the llms.txt proposal in September 2024. His reasoning was straightforward: site authors know their own content better than any automated crawler. Why not let them curate what matters most for AI consumption?

The specification lives at llmstxt.org and has since gained significant traction across the developer ecosystem.

How It Works

An llms.txt file follows a simple Markdown structure:

# Your Site Name

> A concise summary of what your site does and who it's for.

Additional context that helps an AI understand the big picture.

## Documentation

- [Getting Started](https://example.com/docs/getting-started.md): Everything you need to set up
- [API Reference](https://example.com/docs/api.md): Complete endpoint documentation
- [Tutorials](https://example.com/docs/tutorials.md): Step-by-step guides

## Blog

- [Latest Release Notes](https://example.com/blog/release-notes.md): What's new in v3.0
- [Architecture Overview](https://example.com/blog/architecture.md): How the system is designed

## Optional

- [Changelog](https://example.com/changelog.md): Full version history
- [Contributing](https://example.com/contributing.md): How to contribute

The structure breaks down as:

  • H1 heading -Your site or project name (the only required element)
  • Blockquote -A brief summary with the essential context
  • Body text -Additional background information
  • H2 sections -Organized groups of links to your most important content
  • ## Optional section -Secondary content that can be skipped when context space is tight

Notice that the links don’t point to regular HTML pages -they point to clean Markdown (.md) files. That’s where companion markdown files come in.

The Role of Companion Markdown Files

The llms.txt file itself is just an index -a table of contents. The real power comes from the companion .md files it links to. These are clean, text-only Markdown versions of your web pages, stripped of all the HTML noise.

For every important page on your site, you provide a parallel Markdown version:

HTML PageMarkdown Companion
example.com/docs/apiexample.com/docs/api.md
example.com/blog/my-postexample.com/blog/my-post.md
example.com/aboutexample.com/about.md

These companion files contain only the content that matters: headings, paragraphs, code examples, and lists -no navigation, no sidebar, no footer, no scripts. Just clean, structured text that a language model can read directly.

This is what makes the llms.txt ecosystem genuinely useful. The index file tells AI where to look. The companion markdown files give it what to read. Together, they create a clean, curated path through your content that works at inference time -when an AI is actively generating a response to a user’s question about your site.

Why Markdown Specifically?

Markdown was chosen deliberately. It’s the format LLMs handle most naturally. They were trained on massive amounts of Markdown from GitHub repos, documentation sites, and developer forums. When you give a language model clean Markdown, it doesn’t need to do any conversion or interpretation. It can focus entirely on understanding the content.

How llms.txt Differs from robots.txt and sitemap.xml

These three files serve fundamentally different purposes:

robots.txt is a bouncer. It controls access -telling crawlers which parts of your site they can and can’t visit. It’s about exclusion and restriction.

sitemap.xml is a directory. It lists every page on your site with metadata like last-modified dates and change frequency. It’s comprehensive but undiscriminating -every page gets equal billing.

llms.txt is a personal tour guide. It says: “Out of everything on this site, these are the pages that matter most, organized by topic, with context about why they matter.” It’s about curation and prioritization.

Another key distinction: robots.txt and sitemaps are primarily about crawling and indexing (training data). llms.txt is designed for inference time -the moment when an AI tool is actively trying to answer a question or complete a task using your site’s content.

The llms-full.txt Variant

Some sites also provide an llms-full.txt file -a single, comprehensive Markdown document containing the full text of their documentation. While llms.txt is a lightweight index for real-time assistants, llms-full.txt is designed for deep ingestion.

Think of it this way:

  • llms.txt = a guidebook with a map and descriptions of key stops
  • llms-full.txt = the complete encyclopedia

Cloudflare’s llms-full.txt, for example, clocks in at 3.7 million tokens covering 20+ products. It’s not meant to be pasted into a chat window -it’s built for tools that can chunk and index large amounts of content.

This variant was developed by Mintlify in collaboration with Anthropic and has since been adopted into the official specification.

Who’s Using llms.txt Today

Adoption has been strongest among developer-facing companies and documentation platforms:

  • Anthropic publishes a curated llms.txt index linking to a comprehensive llms-full.txt of their documentation
  • Stripe organizes their file around product verticals -Payments, Checkout, Webhooks, Testing -with descriptive context for each link
  • Cloudflare runs one of the most extensive implementations, covering their entire developer documentation suite
  • Vercel attributes a notable portion of their signups to AI-driven discovery, with llms.txt as part of their strategy
  • Cursor, Windsurf, and Bolt (AI development tools) provide focused llms.txt files prioritizing setup and configuration workflows

Documentation platforms like Mintlify and GitBook have built native llms.txt generation into their products. In the WordPress ecosystem, plugins from Yoast and Rank Math added automatic generation features, and dedicated plugins like DesignSetGo provide full llms.txt management with companion markdown file generation.

The Honest Assessment: What llms.txt Can and Can’t Do

Let’s be straightforward about where things stand.

What llms.txt does well:

  • Developer tools love it. AI coding assistants like Cursor have explicit support for ingesting llms.txt files. Point it at a documentation URL and it indexes the content for reference during coding sessions.
  • Manual context loading works. When you paste llms.txt content into ChatGPT or Claude, the models can use it effectively. It’s a clean, structured way to provide context.
  • It improves content quality for AI. By curating and providing Markdown versions of your content, you ensure AI tools work with accurate, up-to-date information rather than poorly scraped HTML.

What’s still uncertain:

  • No major LLM provider has confirmed automatic consumption. OpenAI, Anthropic, Google, and others haven’t publicly stated that their models automatically discover and use llms.txt during inference. Google’s John Mueller has been particularly skeptical.
  • It’s not a ranking factor. If you’re implementing llms.txt purely because you think it’ll boost your visibility in ChatGPT or Perplexity answers, the evidence doesn’t support that claim yet.
  • It’s a community convention, not a web standard. There’s no W3C or IETF backing. It’s a proposal with growing adoption, but adoption alone doesn’t make it a standard.

The balanced take:

llms.txt is most valuable when you think of it as good content hygiene for the AI era rather than an SEO silver bullet. It forces you to think about which content matters most, how it’s organized, and whether it’s accessible in a clean format. Those are useful exercises regardless of whether any particular AI tool automatically reads your file.

Best Practices for Implementation

If you’re ready to add llms.txt to your site, here’s what works:

Curate aggressively. This isn’t a sitemap. Don’t include every page. Focus on the 20-30 most important resources that would genuinely help someone (or something) understand your site.

Write descriptive link text. Every entry should include a brief explanation:

- [Authentication Guide](https://example.com/docs/auth.md): OAuth 2.0 setup, API key management, and session handling

Order by importance. Put your most critical content first within each section. Position in the file acts as a priority signal.

Provide real Markdown companions. Link to actual .md files, not HTML pages. The value of llms.txt drops significantly if the linked resources are still full of HTML noise.

Keep it synchronized. When your site content changes, your llms.txt and companion files should update too. Stale links and outdated content undermine the entire purpose.

Keep the index file small. Aim for under 10KB for the main llms.txt file. Use llms-full.txt if you need to provide comprehensive content.

Review quarterly. Set a reminder to audit your llms.txt file and ensure it still reflects your site’s most important content.

Looking Ahead

The web is shifting. AI tools are becoming a primary way people discover and interact with online content. Whether or not any specific language model automatically reads llms.txt today, the trajectory is clear: websites that make their content accessible to AI will have an advantage over those that don’t.

llms.txt represents a pragmatic bet on that future. It’s low-cost to implement, it improves your content organization regardless of AI consumption, and it positions your site to benefit as AI tools evolve their content discovery mechanisms.

The question isn’t whether AI will get better at reading websites. It will. The question is whether you’ll be ready when it does -with clean, curated content that tells AI exactly what it needs to know.


llms.txt was proposed by Jeremy Howard of Answer.AI in September 2024. The specification is maintained at llmstxt.org. For WordPress sites, DesignSetGo provides built-in llms.txt generation with automatic companion markdown files for all your published content.