AI-Only Pages

Plugin Banner

AI-Only Pages

by tommyoz12

Download
Description

AI-Only Pages gives you granular control over which search engine bots can index each page on your WordPress site — while simultaneously making those pages more discoverable and useful for AI crawlers like ChatGPT, Claude, and Perplexity.

The core idea: you have content that is perfect for AI training pipelines and retrieval-augmented generation (RAG) systems, but you do not want that content competing for rankings in Google, Bing, or Yahoo. AI-Only Pages lets you mark those pages as AI-only: they disappear from traditional search engine indexes while becoming first-class citizens in the AI ecosystem.

What it does

  • Per-bot noindex — Block individual bots (Googlebot, Bingbot, Yandexbot, etc.) with a checkbox per bot per page. Checking one bot blocks it; the others still index normally.
  • “Block All” master toggle — One click blocks all 10 supported search engine bots simultaneously.
  • <meta> tags and HTTP headers — Both <meta name="googlebot" content="noindex, nofollow"> HTML meta tags and X-Robots-Tag HTTP headers are emitted, covering all crawling contexts. Works correctly on all public post types including Pages and custom post types.
  • SEO plugin integration — Suppresses Yoast SEO, WP Core, and RankMath’s global <meta name="robots"> tag on AI-only pages so there is no conflict between the global tag and your per-bot tags.
  • Sitemap exclusion — AI-Only pages are automatically removed from all XML sitemaps (Yoast SEO and WP Core sitemaps are both supported).
  • /llms-index.txt — A plain-text AI discovery file served at yoursite.com/llms-index.txt listing all AI-only pages with their titles and last-modified dates. AI crawlers can use this file to find your AI-optimised content directly. Can be toggled on/off from the settings page.
  • Token Diet — clean AI output — When an AI crawler visits an AI-only page, the plugin serves a cleaned version of the HTML with navigation, sidebars, footers, cookie banners, inline styles, SVGs, and iframes stripped out. AI models receive pure content with minimal noise.
  • Global Settings Page — A top-level “AI-Only Pages” menu in the WordPress admin sidebar lets you configure Token Diet and LLM Index behaviour globally, without touching code.
  • Caching plugin notice — If WP Rocket, LiteSpeed Cache, or another full-page caching plugin is detected, an admin notice explains how to configure it to work alongside this plugin.

The Settings Page

A full settings page is available under AI-Only Pages in the WordPress admin sidebar. It provides:

Section 1 — Instructions & Status: A “How It Works” guide covering the meta box, Token Diet, and LLM Index. A live, clickable URL to your /llms-index.txt file with a green/red status indicator showing whether the index is active.

Section 2 — LLM Index Settings: A toggle to enable or disable /llms-index.txt globally. When disabled, the endpoint returns a 404.

Section 3 — Token Diet Master Control: A master toggle to enable or disable Token Diet entirely. When off, AI bots receive raw, full HTML — identical to what human visitors see.

Section 4 — Granular Token Diet Stripping: Individual toggles for each category of content stripped:

  • Strip structural layout (headers, footers, sidebars, navigation, cookie banners)
  • Strip <style> tags and embedded CSS
  • Strip <svg> elements (major token bloaters)
  • Strip <iframe> elements (maps, embeds, social widgets)
  • Strip <form> elements (Warning: removes WooCommerce Add to Cart buttons)
  • Strip <script> tags (Note: application/ld+json schema is always preserved)

Supported Search Engine Bots

Googlebot (Web), Googlebot-Image, Googlebot-News, Googlebot-Video, AdsBot-Google, Bingbot, Slurp (Yahoo), DuckDuckBot, Baiduspider, YandexBot.

AI Bots Welcomed

GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, YouBot, Meta-ExternalAgent, Amazonbot, Bytespider, Diffbot, cohere-ai, anthropic-ai, AI2Bot, OAI-SearchBot, and more. These bots are detected automatically and served cleaned content when they visit an AI-only page.

Developer-Friendly

Every major behaviour is extensible via WordPress filters. See the Developer Reference section below. The Settings class hooks into filters at priority 5, leaving priorities 10 and above free for developer overrides — so your custom add_filter() calls always win.

Using the Plugin

Per-page control

  1. Open any post or page in the WordPress editor.
  2. Find the AI-Only Pages meta box in the right sidebar.
  3. Check individual bots to block them, or use Block from ALL search engine bots to check all at once.
  4. Click Publish or Update to save. The noindex tags take effect immediately.
  5. Visit yoursite.com/llms-index.txt to confirm your page appears in the AI content index.

Note: The master toggle requires JavaScript. The individual checkboxes always work regardless of JS state.

Global settings

  1. Go to AI-Only Pages in the WordPress admin sidebar.
  2. Review the “How It Works” section and confirm your /llms-index.txt URL is live.
  3. Use the LLM Index Settings card to enable or disable the discovery file.
  4. Use the Token Diet — Master Control card to enable or disable all output cleaning.
  5. Use the Token Diet — Granular Stripping card to select exactly which HTML elements are stripped from AI output.
  6. Click Save Settings.

Developer Reference

All filters are applied inside AIOnly\Pages\Plugin. The Settings class hooks at priority 5; standard developer priority is 10+.

aionly_ai_crawler_signatures
Array of User-Agent substrings used for Layer 1 bot detection.
@param string[] $signatures
@return string[]

aionly_strip_selectors
CSS-style selector strings passed to Pass 1 of Token Diet (structural removal).
Supports element tag, #id, and .class (one class, no combinators).
@param string[] $selectors
@return string[]

aionly_strip_token_bloat_tags
XPath query strings passed to Pass 2 of Token Diet (tag removal).
@param string[] $queries
@return string[]

aionly_allowed_attributes
HTML attribute names kept on every element by Pass 3 of Token Diet.
Everything else is stripped.
@param string[] $attributes
@return string[]

aionly_should_clean_output
Boolean. Return false to disable Token Diet entirely for a specific post.
@param bool $enabled Default: true.
@param \WP_Post $post
@return bool

aionly_enable_xrobots_headers
Boolean. Return false to suppress X-Robots-Tag HTTP headers.
@param bool $enabled Default: true.
@param \WP_Post $post
@return bool

aionly_cache_ttl
Filter the transient TTL in seconds.
@param int $ttl Default: 600 (10 minutes).
@return int

aionly_llms_index_lines
Filter the array of text lines that make up llms-index.txt before output.
@param string[] $lines Array of lines (including comment lines).
@param int[] $active_ids Post IDs included in the index.
@return string[]

aionly_supported_post_types
Array of public post type slugs the plugin should support.
@param string[] $post_types
@return string[]

aionly_use_heuristic_bot_detection
Boolean. Return false to disable Layer 2 heuristic bot detection.
@param bool $enabled Default: true.
@return bool

Code Examples

Disable heuristic bot detection (uptime monitors):

add_filter( 'aionly_use_heuristic_bot_detection', '__return_false' );

Preserve WooCommerce forms (developer override — wins over settings page):

add_filter( 'aionly_strip_token_bloat_tags', function( $queries ) {
    return array_filter( $queries, function( $q ) {
        return $q !== '//form';
    } );
} );

Add a custom strip selector:

add_filter( 'aionly_strip_selectors', function( $selectors ) {
    $selectors[] = '.advertisement';
    $selectors[] = '#newsletter-popup';
    return $selectors;
} );

Keep class attributes in AI output:

add_filter( 'aionly_allowed_attributes', function( $attrs ) {
    $attrs[] = 'class';
    return $attrs;
} );

Add a custom AI crawler signature:

add_filter( 'aionly_ai_crawler_signatures', function( $sigs ) {
    $sigs[] = 'FutureBot';
    return $sigs;
} );

Restrict to specific post types:

add_filter( 'aionly_supported_post_types', function( $types ) {
    return [ 'post', 'page' ]; // Only posts and pages.
} );

Disable Token Diet on a specific post (always wins, priority 10 > settings priority 5):

add_filter( 'aionly_should_clean_output', function( $enabled, $post ) {
    if ( 42 === $post->ID ) {
        return false; // Post 42 serves full HTML to AI bots.
    }
    return $enabled;
}, 10, 2 );

Read a single setting value in custom code:

$token_diet_on = '1' === \AIOnly\Pages\Settings::get( 'token_diet_enabled' );
$all_settings  = \AIOnly\Pages\Settings::get_settings(); // Full array.

Automatic installation

  1. In your WordPress admin, go to Plugins Add New.
  2. Search for “AI-Only Pages”.
  3. Click Install Now, then Activate.
  4. After activation, go to Settings Permalinks and click Save Changes to flush rewrite rules so /llms-index.txt begins working immediately.
  5. Visit AI-Only Pages in the sidebar to configure global settings.

Manual installation

  1. Download the plugin zip file.
  2. Upload the ai-only-pages folder to /wp-content/plugins/.
  3. Activate the plugin from the Plugins menu.
  4. Go to Settings Permalinks and click Save Changes.

Plugin folder structure

After installation the plugin occupies exactly this structure inside
/wp-content/plugins/ai-only-pages/:

ai-only-pages/
├── ai-only-pages.php           Root loader. Contains the plugin header WordPress
│                               reads for name/version. Performs PHP and WP version
│                               gates. Registers activation/deactivation hooks.
│                               Contains zero modern PHP syntax so it is safe to
│                               parse on PHP 5.x without fatal errors.
│
├── includes/
│   ├── class-plugin.php        The core plugin class. All bot detection, meta
│   │                           boxes, output buffering, Token Diet, LLM Index,
│   │                           and SEO plugin overrides live here.
│   │
│   └── class-settings.php      The Settings class. Registers the top-level admin
│                               menu, the settings page, and all WordPress Settings
│                               API fields. Hooks into core plugin filters at
│                               priority 5 to alter behaviour dynamically from
│                               saved options.
│
├── assets/
│   └── js/
│       └── admin.js            Vanilla JavaScript for the meta box. Handles the
│                               "Block from ALL" master toggle and keeps it in
│                               sync with individual bot checkboxes. No jQuery.
│                               Enqueued only on post.php and post-new.php.
│
├── uninstall.php               Clean removal. Deletes all plugin options and
│                               post meta when the plugin is deleted via the
│                               WordPress admin Plugins screen.
│
└── readme.txt                  This file.

Why this structure?

The split between ai-only-pages.php and class-plugin.php is intentional and critical. WordPress parses the root plugin file to read its header (Plugin Name:, Version:, etc.) before any PHP runs. If the root file used modern PHP syntax and the site ran PHP 7.0, WordPress would throw a fatal parse error before the version gate could display a helpful admin notice. Keeping the root file at PHP 5.4 syntax means the gate always runs and users always see a readable error instead of a white screen.

Both class-plugin.php and class-settings.php use PHP 7.4+ syntax and are both loaded by the root loader after the version gates have passed. No manual wiring is required.

I blocked Googlebot but Google is still indexing the page. Why?

Google may have already crawled and cached the page before you activated the plugin. It can take days or weeks for Google to re-crawl and respect the new noindex directive. If you need faster removal, submit the URL to Google Search Console’s URL Removal tool.

Also verify that the noindex tag is actually appearing on the page: view source and search for <meta name="googlebot".

The noindex tags are not appearing on my Pages (not Posts).

This was a bug fixed in version 1.3.1. Both output_noindex_tags() and output_xrobots_headers() incorrectly checked publicly_queryable to decide whether to proceed. WordPress’s built-in page post type has publicly_queryable = false, causing both methods to silently bail out without writing any tags. Update to 1.3.1 to resolve this.

Does this work with Yoast SEO / RankMath?

Yes. The plugin overrides the global <meta name="robots"> tag that Yoast SEO and RankMath output on AI-only pages. Without this override, Yoast might output <meta name="robots" content="index, follow"> which would conflict with the per-bot tags. On AI-only pages, the global tag is suppressed entirely; only the per-bot tags remain.

/llms-index.txt shows a 404. How do I fix it?

First, check that LLM Index is enabled on the AI-Only Pages settings page.

If it is enabled, go to Settings Permalinks and click Save Changes without changing anything. This flushes WordPress’s rewrite rules, which registers the /llms-index.txt URL pattern.

This flush happens automatically on plugin activation, but some server configurations (particularly Nginx without try_files) may need a manual flush or a server config update.

My caching plugin is serving the same page to both humans and AI bots.

Full-page caching plugins (WP Rocket, LiteSpeed, W3 Total Cache, etc.) serve responses from a disk cache before WordPress runs. The plugin’s output buffer never fires on cached pages.

To fix this, configure your caching plugin to exclude AI-Only page URLs from its cache:

WP Rocket: Settings Cache Never Cache URL(s). Add the slug of each AI-only page.

LiteSpeed Cache: LiteSpeed Cache Cache Do not cache URIs.

W3 Total Cache: Performance Page Cache Never cache the following pages.

Alternatively, add a custom rule to exclude pages with the _aionly_active cookie, or contact your host’s support team — managed WordPress hosts often expose this setting in their dashboard.

My uptime monitor or API client is being treated as an AI bot.

The plugin uses two-layer bot detection. Layer 1 matches known AI crawler signatures. Layer 2 (heuristic) flags requests with no browser engine string in the User-Agent AND no Accept-Language header — a combination that every real browser always sends, but that many CLI tools and monitoring services do not.

The simplest fix is to configure your monitoring tool to send an Accept-Language header. Alternatively, disable heuristic detection entirely:

add_filter( 'aionly_use_heuristic_bot_detection', '__return_false' );

WooCommerce add-to-cart buttons are missing on AI pages. Is that normal?

Yes, if “Strip <form> elements” is enabled in the settings (it is by default). WooCommerce add-to-cart buttons are rendered inside <form> elements. AI crawlers cannot interact with forms anyway — they only read content. If you want AI crawlers to see your product CTAs, turn off “Strip forms” on the AI-Only Pages settings page, or add a developer filter:

add_filter( 'aionly_strip_token_bloat_tags', function( $queries ) {
    return array_filter( $queries, function( $q ) {
        return $q !== '//form';
    } );
} );

Will disabling individual stripping toggles break the page for AI crawlers?

No. Disabling a toggle simply passes more of the original HTML through to the AI crawler. The page is never broken — it may just contain more noise that uses up the crawler’s context window. The defaults are optimised for maximum signal-to-noise ratio.

Do settings-page changes require me to flush permalinks?

No. Settings only affect the output buffer and filter callbacks — they have no impact on WordPress rewrite rules. Changes take effect on the very next AI crawler request.

Where is the settings data stored?

All settings are stored in a single wp_options row with the key aionly_pages_settings as a serialised array. You can inspect or export it like any other WordPress option.

How does the Settings class hook into the core plugin?

class-settings.php uses the same public add_filter() hooks that the core plugin exposes to developers. Specifically:

  • aionly_should_clean_output — used to disable Token Diet when the master toggle is off.
  • aionly_strip_token_bloat_tags — used to build a dynamic XPath query array from granular toggles.
  • aionly_strip_selectors — used to empty the structural selector list when layout stripping is off.
  • template_redirect at priority 0 — used to return a 404 for /llms-index.txt when the LLM Index is disabled.

All these hooks run at priority 5, which means developer overrides at priority 10 (the WordPress default) always take precedence. Your custom filters always win.

1.3.3 — 2026-03-11

  • Fixed: Missing assets/js/admin.js — the meta box “Block from ALL” master toggle was non-functional in 1.3.2 due to the JavaScript file being omitted from the release package.
  • Added: uninstall.php — clean removal of all plugin data (aionly_pages_settings option and all _aionly_* post meta) when the plugin is deleted via the WordPress admin.
  • Added: LICENSE file (GPLv2 full text).
  • Updated: Tested up to bumped to WordPress 6.9.2.

1.3.2 — 2026-03-01

  • Fixed: Output buffer opened by Token Diet (ob_start()) is now explicitly closed via a shutdown hook, preventing potential buffer-stack conflicts with other plugins. Addresses WordPress.org Plugin Review Team feedback.

1.3.1 — 2026-02-20

  • Fixed: Noindex <meta> tags and X-Robots-Tag headers were not emitted on WordPress Pages and non-post custom post types. Both methods incorrectly checked publicly_queryable — WordPress’s built-in page post type has this set to false, causing both to silently return without writing any tags. Fixed by checking public instead.
  • Fixed: Settings page CSS was not loading on some WordPress setups. wp_add_inline_style() was attached to the wp-admin handle which is not guaranteed to be registered in the required state. Fixed by registering a dedicated aionly-settings-ui handle and attaching inline CSS to that.
  • Fixed: Settings admin menu was not appearing because class-settings.php was missing its require_once in the root loader.
  • Fixed: Removed placeholder Plugin URI header pointing to a non-existent URL, which produced a broken “Visit plugin site” link in the Plugins list.
  • Cleaned: Removed dead add_settings_section() and add_settings_field() calls that had no effect since do_settings_sections() is never called.
  • Security: $_SERVER['HTTP_USER_AGENT'] and $_SERVER['HTTP_ACCEPT_LANGUAGE'] now passed through sanitize_text_field( wp_unslash() ) before use.
  • i18n: Added load_plugin_textdomain() so translations load correctly.

1.3.0 — 2026-02-19

  • New: includes/class-settings.php — full WordPress Settings API integration adding a top-level “AI-Only Pages” admin menu with four visual cards.
  • New: LLM Index toggle — enable/disable /llms-index.txt globally. When disabled, the endpoint returns a 404.
  • New: Token Diet master toggle — enable/disable all AI output cleaning globally without touching code.
  • New: Six granular Token Diet toggles — independently control stripping of structural layout, <style> tags, <svg> elements, <iframe> elements, <form> elements, and <script> tags (with the schema application/ld+json preservation guarantee always enforced).
  • New: Live /llms-index.txt URL displayed on the settings page with a green/red status badge.
  • New: “How It Works” explainer built into the settings page — no need to consult the readme for basic orientation.
  • Architecture: Settings class hooks into core plugin filters at priority 5, ensuring developer add_filter() calls at priority 10+ always override settings-page values.
  • Architecture: All settings stored in one wp_options row (aionly_pages_settings) to minimise database overhead.

1.2.1 — 2026-02-19

  • Fixed: Restored Yoast SEO, WP Core, and RankMath global robots override filters that were inadvertently removed in v1.2.0. Without these, Yoast’s global <meta name="robots" content="index, follow"> tag overrode per-bot noindex tags — the core feature was broken for sites using Yoast or RankMath.
  • Fixed: Double-encoding bug in the “AI-optimized & listed” status badge. esc_html_e() was applied to a string already containing &amp;, producing &amp;amp; which rendered as literal “&” in the browser.
  • Fixed: save_meta_data() now always syncs the _aionly_active derived flag on every valid save, not only when individual bot values change. This self-heals any flag desync caused by direct DB edits, imports, or third-party plugins.
  • Fixed: Pro upsell link now includes rel="noopener noreferrer" on target="_blank" to prevent reverse tabnapping.
  • Improved: admin.js now listens to the change event instead of click. The change event is the semantically correct event for checkbox state and handles keyboard (Space bar) and programmatic changes correctly.
  • Improved: Added function_exists() guards to all global functions in the root file to prevent “Cannot redeclare function” fatal errors if the file is somehow processed more than once.

1.2.0 — 2026-02-19

  • Fixed: Asset path bug — PHP enqueued assets/js/admin.js but the file was located at assets/admin.js. The JS file 404’d and the “Block from ALL” button was dead on arrival.
  • Fixed: DOMContentLoaded wrapper removed from admin.js. Scripts enqueued with in_footer=true execute after that event fires; the callback was never running.
  • Fixed: admin.js now uses classList API instead of fragile className.indexOf() string matching for class detection.
  • Fixed: Restored missing before_delete_post cache-clearing logic that was inadvertently merged with transition_post_status into a single variadic function.
  • Fixed: Heuristic bot detection (Layer 2) restored after it was silently removed in a prior refactor.
  • Fixed: junk_queries loop now uses iterator_to_array() to snapshot the live DOMNodeList before iterating. Iterating a live list while removing nodes caused silent skips.
  • Improved: All inline attribute iteration now collects attributes into an array before removal, preventing NamedNodeMap reindexing skips.
  • Improved: Post ID resolved explicitly from $_GET['post'] / $_POST['post_ID'] in enqueue_admin_assets(), removing reliance on the implicit global $post.

1.1.6 — 2026-02-19

  • Major stability pass following an external AI code review that missed three deployment-blocking bugs while reporting “zero bugs found.”
  • Fixed JS path (assets/js/ subfolder), DOMContentLoaded timing, and class name mismatch between PHP and JS.

1.1.3 — 2026-02-19

  • Fixed plugin_dir_url() path calculation using dirname(dirname(__FILE__)).
  • Fixed XPath injection via unsanitized filter values in aionly_strip_token_bloat_tags.
  • Fixed get_post() fragility — now resolves post ID explicitly from request superglobals.

1.1.2 — 2026-02-19

  • Introduced Token Diet V2 with three-pass HTML cleaning (structural, bloat, attributes).
  • Added master “Block All” toggle with JavaScript event delegation.
  • Added X-Robots-Tag HTTP headers alongside HTML meta tags.
  • Added Yoast SEO, WP Core, and RankMath robots override filters.
  • Added heuristic bot detection layer (no browser UA markers + no Accept-Language).

1.1.0

  • Added /llms-index.txt discovery file with transient caching.
  • Added caching plugin compatibility notice.
  • Added sitemap exclusion for Yoast SEO and WP Core sitemaps.

1.0.3

  • Fixed nonce verification — nonces are now post-specific to prevent cross-post replay.
  • Fixed capability check using post type’s own capability type.
  • Added transition_post_status hook to clear transient cache on post status changes.

1.0.1

  • Initial public release.
  • Per-bot noindex meta box with 10 supported search engine bots.
  • Transient-cached active post ID query.
  • Activation/deactivation rewrite rule management.
Back to top