Markdown for AI Agents

Plugin Banner

Markdown for AI Agents

by Selvakumar Duraipandian

Download
Description

Markdown for AI Agents is a lightweight WordPress plugin that enables HTTP content negotiation for your site’s content. When a client (like an AI agent or a custom script) requests a page with the Accept: text/markdown header, the plugin intercepts the request and returns a clean, structured Markdown representation of the post or page content.

This is ideal for AI crawlers, RAG (Retrieval-Augmented Generation) systems, and non-browser clients that prefer machine-friendly text over complex HTML.

Important note: This plugin is primarily a developer/integration tool. Human visitors browsing your site will never see any difference — the Markdown output is only served when explicitly requested via the Accept: text/markdown HTTP header. Normal browser requests always receive the standard HTML page.

Key Features:

  • Automatically detects Accept: text/markdown headers.
  • Converts HTML content to clean Markdown using the League HTMLToMarkdown library.
  • Strips away theme layout, navigation, headers, footers, and sidebars — serving only the main content.
  • Adds useful HTTP response headers: Content-Type: text/markdown, Vary: Accept, and X-Markdown-Word-Count.
  • Respects WordPress visibility rules and filters.
  • No configuration required — works out of the box for posts, pages, and custom post types.

How It Works

This plugin uses a standard web technique called HTTP content negotiation. The same URL on your site can serve different representations of the same content depending on what the client asks for:

  • A regular browser sends Accept: text/html receives your normal HTML page.
  • An AI agent sends Accept: text/markdown receives a clean Markdown version of the same page.

No extra URLs, no duplicate content, no configuration needed. The plugin hooks into WordPress’s template_redirect action, detects the Accept header, captures the rendered HTML, converts it to Markdown, and returns it with appropriate headers.

Why Markdown for AI Agents?

When building RAG (Retrieval-Augmented Generation) applications or AI pipelines that ingest web content, HTML is extremely noisy. A typical WordPress page contains thousands of tokens worth of HTML tags, inline styles, navigation menus, scripts, and layout markup — none of which carries meaning for an AI model.

Serving clean Markdown instead can reduce token consumption by up to 60%, which means:

  • Lower API costs — fewer tokens ingested when loading pages into vector stores or LLM pipelines.
  • Faster processing — less text for the model to parse, filter, and discard.
  • Better retrieval accuracy — higher signal-to-noise ratio improves the quality of RAG results.
  • Simpler pipelines — no need for custom HTML stripping logic on the client side; the plugin handles it server-side.

Any AI agent, crawler, or ingestion script that sends Accept: text/markdown in its request header will automatically receive the clean Markdown version — no extra URLs, no separate endpoints, no changes to your content workflow.

  1. Upload the markdown-for-ai-agents folder to the /wp-content/plugins/ directory, or install it directly via the WordPress Plugins screen.
  2. Activate the plugin through the Plugins menu in WordPress.
  3. The plugin works immediately with no settings to configure.

Verifying it works — Option 1: Using curl (recommended for developers)

Open your terminal and run the following command, replacing the URL with any post or page on your site:

curl -H "Accept: text/markdown" https://your-site.com/sample-page/

You should see plain Markdown text returned instead of HTML. For example:

curl -H "Accept: text/markdown" https://your-site.com/hello-world/

A successful response will begin with the post title as a Markdown heading (e.g. # Hello World) followed by the post content in Markdown format, with no HTML tags, navigation, or sidebar content.

Verifying it works — Option 2: Using a browser extension (no terminal required)

  1. Install the ModHeader extension for Chrome or Firefox (free, available in their respective extension stores).
  2. Open ModHeader and add a new request header: Name = Accept, Value = text/markdown.
  3. Enable the header and visit any post or page on your WordPress site.
  4. Your browser will display the raw Markdown text of that page instead of the styled HTML version.
  5. Disable or remove the header in ModHeader when you are done to return to normal browsing.

Verifying it works — Option 3: Using an online HTTP client

Tools like Hoppscotch (free, browser-based) allow you to make HTTP requests with custom headers without installing anything:

  1. Go to https://hoppscotch.io
  2. Enter any post or page URL from your site.
  3. Under Headers, add Accept = text/markdown.
  4. Click Send and you will see the Markdown response in the response panel.
  1. Sample Markdown output returned in a terminal using <code>curl -H "Accept: text/markdown"</code> on a WordPress post. The response shows the post title as a top-level Markdown heading, followed by the post body in plain Markdown, with all HTML, navigation, and theme chrome stripped away.

    Sample Markdown output returned in a terminal using curl -H "Accept: text/markdown" on a WordPress post. The response shows the post title as a top-level Markdown heading, followed by the post body in plain Markdown, with all HTML, navigation, and theme chrome stripped away.

Does this change what human visitors see?

No. Standard browser requests always receive the normal HTML version of your pages. The Markdown output is only served when a client explicitly includes Accept: text/markdown in its HTTP request header. No regular browser sends this header by default.

Which post types are supported?

All singular content types are supported: standard Posts, Pages, and any registered Custom Post Types. Archive pages, category pages, and the homepage (if set to a blog feed) are not served as Markdown.

How do I know the plugin is working?

Use any of the three verification methods described in the Installation section above. The quickest check is: curl -H "Accept: text/markdown" https://your-site.com/sample-page/. A working response will return plain text starting with your post title as a Markdown # heading.

What content is included in the Markdown output?

The Markdown output contains the full rendered post or page content — the title and body — converted to Markdown. Navigation menus, sidebars, footers, and <script> and <style> tags are automatically stripped out to provide a clean, token-efficient result for AI consumption.

What HTTP headers does the plugin send with the Markdown response?

The response includes:

  • Content-Type: text/markdown; charset=utf-8 — tells the client the content format.
  • Vary: Accept — informs caches that the response varies based on the Accept header, preventing cached HTML from being served to Markdown clients.
  • X-Markdown-Generator: Markdown for AI Agents — identifies the plugin.
  • X-Markdown-Word-Count: [number] — the word count of the Markdown content.

Will this affect my SEO or page caching?

No. The Vary: Accept header is set on Markdown responses so that HTTP caches (including CDNs) correctly cache HTML and Markdown separately. Search engine crawlers do not send Accept: text/markdown headers, so they will always receive and index the normal HTML version of your pages.

Does the plugin require any additional libraries?

The League HTMLToMarkdown library is bundled inside the plugin under includes/lib/HtmlToMarkdown/. No additional installation steps are required.

Is this compatible with page builders or block themes?

Yes. Because the plugin captures the fully rendered HTML output (after WordPress and any theme or plugin has finished building the page), it works regardless of whether your content is built with the Classic Editor, Gutenberg blocks, Elementor, or any other page builder.

1.0.0

  • Initial release.
Back to top