Skip to content

feat: add search_docs tool for searching Plane documentation#158

Open
sriramveeraghanta wants to merge 2 commits into
mainfrom
feat/search-docs-tool
Open

feat: add search_docs tool for searching Plane documentation#158
sriramveeraghanta wants to merge 2 commits into
mainfrom
feat/search-docs-tool

Conversation

@sriramveeraghanta

@sriramveeraghanta sriramveeraghanta commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

Adds a search_docs MCP tool that runs local full-text search over Plane's two public documentation sites and returns ranked results with snippets and URLs.

search_docs(query: str, source: "all" | "help" | "developer" = "all", limit: int = 5) -> dict
  • help → docs.plane.so (product/usage) · developer → developers.plane.so (API reference, self-hosting, OAuth, webhooks, MCP) · all → both.
  • Fetches each site's Mintlify llms-full.txt, caches it in-process (1h TTL, with a double-checked lock so concurrent calls don't double-fetch), splits it into per-page records, and ranks by saturated, field-weighted term frequency (title > description > body, plus a coverage boost when all query terms match) so a focused page beats a long page that merely repeats a term.
  • Returns each match as {title, url, source, score, snippet}. Fetch failures are surfaced as a non-fatal error/warnings field rather than crashing the tool.

Why

Lets assistants answer "how do I…" (product) and "how do I build on Plane…" (API/self-host) questions by citing the official docs, instead of relying on stale model knowledge.

Implementation notes

  • New module plane_mcp/tools/docs.py, registered via tools/__init__.py (matches the existing per-domain pattern). No Plane auth — the docs are public.
  • Pure functions (_parse_llms_full, _parse_yaml_fields, _score_page, _make_snippet, _search) are split from I/O so the parser and ranker are unit-tested with no network.
  • Handles YAML block-scalar frontmatter (e.g. url: >- on API-reference pages), strips .html suffixes for clean browsable URLs, and decodes HTML entities in titles/snippets.
  • Adds httpx as an explicit dependency (already transitive via fastmcp).

Testing

  • 15 new unit tests in tests/test_docs_search.py (parser, tokenizer, ranking, source filter, limits, error handling) — no network required.
  • Full suite passes (excluding the live-credential integration test); ruff check and ruff format clean.
  • Verified end-to-end against the live sites. Example results:
    • "rest api create work item"/api-reference/issue/add-issue
    • "self host docker compose"/self-hosting/methods/docker-compose
    • "how to create a cycle"/core-concepts/cycles

Summary by CodeRabbit

  • New Features

    • Added a documentation search tool that can return ranked results, snippets, or full page text for Plane docs.
    • Documentation pages can now be searched across help and developer content sources.
  • Bug Fixes

    • Improved guidance so documentation is checked first for how-to, why, and what questions before action-oriented requests.
    • Added more reliable handling for partial documentation fetch issues.
  • Tests

    • Added offline coverage for parsing, ranking, filtering, and full-text search behavior.

Add a `search_docs` MCP tool that runs local full-text search over Plane's
two public documentation sites and returns ranked results with snippets.

- docs.plane.so (product/help) and developers.plane.so (API, self-hosting,
  OAuth, webhooks, MCP), selectable via a `source` filter ("all" by default).
- Fetches each site's Mintlify llms-full.txt, caches it in-process (1h TTL),
  splits it into per-page records, and ranks by saturated, field-weighted
  term frequency (title > description > body) so coverage beats repetition.
- Handles YAML block-scalar frontmatter, strips .html suffixes, and decodes
  HTML entities in titles/snippets.
- Fetch failures are surfaced as a non-fatal error/warnings field.

Adds httpx as an explicit dependency and unit tests for the parser and ranker
(no network required).
@sriramveeraghanta sriramveeraghanta changed the base branch from canary to main June 19, 2026 15:31
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a new documentation search tool, wires it into MCP registration and server instructions, updates the README, adds an httpx dependency, and includes offline tests for parsing, search, and full-text output.

Changes

Documentation Search Feature

Layer / File(s) Summary
Corpus parsing
plane_mcp/tools/docs.py
Parses llms-full.txt pages into DocPage records, extracting URLs, titles, descriptions, and body content from frontmatter blocks.
Search scoring, snippets, and caching
plane_mcp/tools/docs.py
Tokenizes queries, scores pages, builds snippets or full content, fetches the docs corpora with httpx, and caches parsed results in memory.
MCP registration and guidance
plane_mcp/tools/docs.py, plane_mcp/tools/__init__.py, plane_mcp/instructions.py, README.md, pyproject.toml
Registers search_docs with FastMCP, adds the tool to the server instruction flow and README, and adds httpx as a runtime dependency.
Offline search tests
tests/test_docs_search.py
Adds offline tests for parsing, tokenization, ranking, source filtering, fetch errors, and full_text response shape.

Sequence Diagram(s)

sequenceDiagram
  participant MCPClient
  participant FastMCP
  participant search_docs
  participant DocsPlane as docs.plane.so/llms-full.txt
  participant DevPlane as developers.plane.so/llms-full.txt

  MCPClient->>FastMCP: ask for a how/what/why answer
  FastMCP->>search_docs: invoke query, source, limit, full_text
  search_docs->>DocsPlane: GET llms-full.txt
  search_docs->>DevPlane: GET llms-full.txt
  search_docs->>search_docs: parse pages, score matches, build snippet or content
  search_docs-->>FastMCP: ranked results with optional error/warnings
  FastMCP-->>MCPClient: tool response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • Prashant-Surya

Poem

A bunny bounced through docs by moonlight,
Sniffing snippets snug and bright.
search_docs chirped, “Hop this way!”
Full text, ranks, and pages at play.
Now Plane lore nibbles neat and light 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.59% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding the search_docs tool for Plane documentation search.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/search-docs-tool

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plane_mcp/tools/docs.py`:
- Around line 347-377: `search_docs` currently returns a raw dict and does not
follow the `plane_mcp/tools` contract. Update
`register_docs_tools`/`search_docs` to call `get_plane_client_context()` first,
then use the returned client and workspace slug when performing the docs lookup.
Replace the dict return with a plane-sdk Pydantic response model for the search
results so tool registration and schema stay consistent with the rest of
`plane_mcp/tools`.
- Around line 233-239: The scoring logic in docs.py is using raw substring
counting in the query matching path, so terms like api can match unrelated words
and skew ranking/snippets. Update the matching in the token loop inside the
search/ranking code (including the related excerpt logic around the referenced
matching helper) to use tokenized comparisons or word-boundary matching instead
of str.count/find on the full lowercase text, and keep the scoring based on true
word hits only.
- Around line 291-303: The _get_corpus cache only stores successful fetches, so
repeated _search() calls will keep retrying a failing docs source on every
request. Update _get_corpus to record a short-lived negative cache/backoff entry
for _fetch_corpus failures, and have the cached lookup in _CACHE distinguish
between a normal pages result and a recent failure so search_docs can skip
hammering the same source during outages while still retrying after the backoff
expires.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: abca5c50-9436-4c3a-b4b2-7f1ed7016ddf

📥 Commits

Reviewing files that changed from the base of the PR and between d3ab7cc and 34f477b.

📒 Files selected for processing (6)
  • README.md
  • plane_mcp/instructions.py
  • plane_mcp/tools/__init__.py
  • plane_mcp/tools/docs.py
  • pyproject.toml
  • tests/test_docs_search.py

Comment thread plane_mcp/tools/docs.py
Comment on lines +233 to +239
for token in query_tokens:
title_hits = title_l.count(token)
desc_hits = desc_l.count(token)
body_hits = body_l.count(token)
if title_hits or desc_hits or body_hits:
matched_terms += 1
score += 6.0 * _saturate(title_hits) + 3.0 * _saturate(desc_hits) + 2.0 * _saturate(body_hits)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use token or word-boundary matches instead of raw substrings.

str.count() / find() on the raw text makes any substring a hit, so a query like api will also score and excerpt unrelated words that merely contain api. That skews the core ranking behavior and can generate misleading snippets. Count against tokenized fields or use word-boundary matching instead.

Also applies to: 260-262

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plane_mcp/tools/docs.py` around lines 233 - 239, The scoring logic in docs.py
is using raw substring counting in the query matching path, so terms like api
can match unrelated words and skew ranking/snippets. Update the matching in the
token loop inside the search/ranking code (including the related excerpt logic
around the referenced matching helper) to use tokenized comparisons or
word-boundary matching instead of str.count/find on the full lowercase text, and
keep the scoring based on true word hits only.

Comment thread plane_mcp/tools/docs.py
Comment on lines +291 to +303
def _get_corpus(source: str) -> list[DocPage]:
"""Return the cached corpus for a source, fetching it if stale or missing."""
cached = _CACHE.get(source)
if cached and time.time() - cached[0] < _CACHE_TTL:
return cached[1]
with _CACHE_LOCK:
# Re-check under the lock so concurrent callers don't double-fetch.
cached = _CACHE.get(source)
if cached and time.time() - cached[0] < _CACHE_TTL:
return cached[1]
pages = _fetch_corpus(source)
_CACHE[source] = (time.time(), pages)
return pages

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Cache fetch failures instead of retrying them on every request.

Right now only successful fetches populate _CACHE. If either docs host is slow or down, every search_docs call pays the full network timeout again and keeps hammering the failing upstream, even though _search() already treats that failure as non-fatal. A short negative-cache/backoff entry would keep the tool responsive during outages.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plane_mcp/tools/docs.py` around lines 291 - 303, The _get_corpus cache only
stores successful fetches, so repeated _search() calls will keep retrying a
failing docs source on every request. Update _get_corpus to record a short-lived
negative cache/backoff entry for _fetch_corpus failures, and have the cached
lookup in _CACHE distinguish between a normal pages result and a recent failure
so search_docs can skip hammering the same source during outages while still
retrying after the backoff expires.

Comment thread plane_mcp/tools/docs.py
Comment on lines +347 to +377
def register_docs_tools(mcp: FastMCP) -> None:
"""Register documentation-search tools with the MCP server."""

@mcp.tool()
def search_docs(
query: str,
source: Literal["all", "help", "developer"] = "all",
limit: int = 5,
full_text: bool = False,
) -> dict:
"""
Search Plane's official docs for how-to and conceptual answers.

Use for any how / what / why question about Plane — product usage
(docs.plane.so) or building on Plane: REST API, self-hosting, OAuth,
webhooks, MCP (developers.plane.so). Prefer over action tools, which act but
do not explain. Find a page with the default snippets, then re-call with
full_text=True, limit=1 to read it in full from cache (no URL fetch needed).

Args:
query: Question or keywords, e.g. "how to create a cycle".
source: "help" (product), "developer" (API / build), or "all" (default).
limit: Max results, 1-20 (default 5).
full_text: True returns each page's full "content" instead of a
"snippet"; use with limit=1.

Returns:
{"query", "results": [{"title", "url", "source", "score", and "snippet"
or "content"}]}; "error"/"warnings" only if a docs site fetch failed.
"""
return _search(query, source, limit, full_text)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Align search_docs with the plane_mcp/tools contract.

This tool is registered from plane_mcp/tools/, but it returns a raw dict and skips get_plane_client_context(). That breaks the repo’s tool-level contract for schema/registration consistency; either move this out of the contract-bound package or add the standard context lookup plus a plane-sdk Pydantic response model. As per coding guidelines, plane_mcp/tools/**/*.py: Tool functions must return Pydantic models from plane-sdk and Each tool must call get_plane_client_context() to obtain client and workspace_slug.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plane_mcp/tools/docs.py` around lines 347 - 377, `search_docs` currently
returns a raw dict and does not follow the `plane_mcp/tools` contract. Update
`register_docs_tools`/`search_docs` to call `get_plane_client_context()` first,
then use the returned client and workspace slug when performing the docs lookup.
Replace the dict return with a plane-sdk Pydantic response model for the search
results so tool registration and schema stay consistent with the rest of
`plane_mcp/tools`.

Source: Coding guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants