feat: add search_docs tool for searching Plane documentation#158
feat: add search_docs tool for searching Plane documentation#158sriramveeraghanta wants to merge 2 commits into
Conversation
Add a `search_docs` MCP tool that runs local full-text search over Plane's
two public documentation sites and returns ranked results with snippets.
- docs.plane.so (product/help) and developers.plane.so (API, self-hosting,
OAuth, webhooks, MCP), selectable via a `source` filter ("all" by default).
- Fetches each site's Mintlify llms-full.txt, caches it in-process (1h TTL),
splits it into per-page records, and ranks by saturated, field-weighted
term frequency (title > description > body) so coverage beats repetition.
- Handles YAML block-scalar frontmatter, strips .html suffixes, and decodes
HTML entities in titles/snippets.
- Fetch failures are surfaced as a non-fatal error/warnings field.
Adds httpx as an explicit dependency and unit tests for the parser and ranker
(no network required).
📝 WalkthroughWalkthroughAdds a new documentation search tool, wires it into MCP registration and server instructions, updates the README, adds an ChangesDocumentation Search Feature
Sequence Diagram(s)sequenceDiagram
participant MCPClient
participant FastMCP
participant search_docs
participant DocsPlane as docs.plane.so/llms-full.txt
participant DevPlane as developers.plane.so/llms-full.txt
MCPClient->>FastMCP: ask for a how/what/why answer
FastMCP->>search_docs: invoke query, source, limit, full_text
search_docs->>DocsPlane: GET llms-full.txt
search_docs->>DevPlane: GET llms-full.txt
search_docs->>search_docs: parse pages, score matches, build snippet or content
search_docs-->>FastMCP: ranked results with optional error/warnings
FastMCP-->>MCPClient: tool response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plane_mcp/tools/docs.py`:
- Around line 347-377: `search_docs` currently returns a raw dict and does not
follow the `plane_mcp/tools` contract. Update
`register_docs_tools`/`search_docs` to call `get_plane_client_context()` first,
then use the returned client and workspace slug when performing the docs lookup.
Replace the dict return with a plane-sdk Pydantic response model for the search
results so tool registration and schema stay consistent with the rest of
`plane_mcp/tools`.
- Around line 233-239: The scoring logic in docs.py is using raw substring
counting in the query matching path, so terms like api can match unrelated words
and skew ranking/snippets. Update the matching in the token loop inside the
search/ranking code (including the related excerpt logic around the referenced
matching helper) to use tokenized comparisons or word-boundary matching instead
of str.count/find on the full lowercase text, and keep the scoring based on true
word hits only.
- Around line 291-303: The _get_corpus cache only stores successful fetches, so
repeated _search() calls will keep retrying a failing docs source on every
request. Update _get_corpus to record a short-lived negative cache/backoff entry
for _fetch_corpus failures, and have the cached lookup in _CACHE distinguish
between a normal pages result and a recent failure so search_docs can skip
hammering the same source during outages while still retrying after the backoff
expires.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: abca5c50-9436-4c3a-b4b2-7f1ed7016ddf
📒 Files selected for processing (6)
README.mdplane_mcp/instructions.pyplane_mcp/tools/__init__.pyplane_mcp/tools/docs.pypyproject.tomltests/test_docs_search.py
| for token in query_tokens: | ||
| title_hits = title_l.count(token) | ||
| desc_hits = desc_l.count(token) | ||
| body_hits = body_l.count(token) | ||
| if title_hits or desc_hits or body_hits: | ||
| matched_terms += 1 | ||
| score += 6.0 * _saturate(title_hits) + 3.0 * _saturate(desc_hits) + 2.0 * _saturate(body_hits) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use token or word-boundary matches instead of raw substrings.
str.count() / find() on the raw text makes any substring a hit, so a query like api will also score and excerpt unrelated words that merely contain api. That skews the core ranking behavior and can generate misleading snippets. Count against tokenized fields or use word-boundary matching instead.
Also applies to: 260-262
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plane_mcp/tools/docs.py` around lines 233 - 239, The scoring logic in docs.py
is using raw substring counting in the query matching path, so terms like api
can match unrelated words and skew ranking/snippets. Update the matching in the
token loop inside the search/ranking code (including the related excerpt logic
around the referenced matching helper) to use tokenized comparisons or
word-boundary matching instead of str.count/find on the full lowercase text, and
keep the scoring based on true word hits only.
| def _get_corpus(source: str) -> list[DocPage]: | ||
| """Return the cached corpus for a source, fetching it if stale or missing.""" | ||
| cached = _CACHE.get(source) | ||
| if cached and time.time() - cached[0] < _CACHE_TTL: | ||
| return cached[1] | ||
| with _CACHE_LOCK: | ||
| # Re-check under the lock so concurrent callers don't double-fetch. | ||
| cached = _CACHE.get(source) | ||
| if cached and time.time() - cached[0] < _CACHE_TTL: | ||
| return cached[1] | ||
| pages = _fetch_corpus(source) | ||
| _CACHE[source] = (time.time(), pages) | ||
| return pages |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Cache fetch failures instead of retrying them on every request.
Right now only successful fetches populate _CACHE. If either docs host is slow or down, every search_docs call pays the full network timeout again and keeps hammering the failing upstream, even though _search() already treats that failure as non-fatal. A short negative-cache/backoff entry would keep the tool responsive during outages.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plane_mcp/tools/docs.py` around lines 291 - 303, The _get_corpus cache only
stores successful fetches, so repeated _search() calls will keep retrying a
failing docs source on every request. Update _get_corpus to record a short-lived
negative cache/backoff entry for _fetch_corpus failures, and have the cached
lookup in _CACHE distinguish between a normal pages result and a recent failure
so search_docs can skip hammering the same source during outages while still
retrying after the backoff expires.
| def register_docs_tools(mcp: FastMCP) -> None: | ||
| """Register documentation-search tools with the MCP server.""" | ||
|
|
||
| @mcp.tool() | ||
| def search_docs( | ||
| query: str, | ||
| source: Literal["all", "help", "developer"] = "all", | ||
| limit: int = 5, | ||
| full_text: bool = False, | ||
| ) -> dict: | ||
| """ | ||
| Search Plane's official docs for how-to and conceptual answers. | ||
|
|
||
| Use for any how / what / why question about Plane — product usage | ||
| (docs.plane.so) or building on Plane: REST API, self-hosting, OAuth, | ||
| webhooks, MCP (developers.plane.so). Prefer over action tools, which act but | ||
| do not explain. Find a page with the default snippets, then re-call with | ||
| full_text=True, limit=1 to read it in full from cache (no URL fetch needed). | ||
|
|
||
| Args: | ||
| query: Question or keywords, e.g. "how to create a cycle". | ||
| source: "help" (product), "developer" (API / build), or "all" (default). | ||
| limit: Max results, 1-20 (default 5). | ||
| full_text: True returns each page's full "content" instead of a | ||
| "snippet"; use with limit=1. | ||
|
|
||
| Returns: | ||
| {"query", "results": [{"title", "url", "source", "score", and "snippet" | ||
| or "content"}]}; "error"/"warnings" only if a docs site fetch failed. | ||
| """ | ||
| return _search(query, source, limit, full_text) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Align search_docs with the plane_mcp/tools contract.
This tool is registered from plane_mcp/tools/, but it returns a raw dict and skips get_plane_client_context(). That breaks the repo’s tool-level contract for schema/registration consistency; either move this out of the contract-bound package or add the standard context lookup plus a plane-sdk Pydantic response model. As per coding guidelines, plane_mcp/tools/**/*.py: Tool functions must return Pydantic models from plane-sdk and Each tool must call get_plane_client_context() to obtain client and workspace_slug.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plane_mcp/tools/docs.py` around lines 347 - 377, `search_docs` currently
returns a raw dict and does not follow the `plane_mcp/tools` contract. Update
`register_docs_tools`/`search_docs` to call `get_plane_client_context()` first,
then use the returned client and workspace slug when performing the docs lookup.
Replace the dict return with a plane-sdk Pydantic response model for the search
results so tool registration and schema stay consistent with the rest of
`plane_mcp/tools`.
Source: Coding guidelines
Summary
Adds a
search_docsMCP tool that runs local full-text search over Plane's two public documentation sites and returns ranked results with snippets and URLs.help→ docs.plane.so (product/usage) ·developer→ developers.plane.so (API reference, self-hosting, OAuth, webhooks, MCP) ·all→ both.llms-full.txt, caches it in-process (1h TTL, with a double-checked lock so concurrent calls don't double-fetch), splits it into per-page records, and ranks by saturated, field-weighted term frequency (title > description > body, plus a coverage boost when all query terms match) so a focused page beats a long page that merely repeats a term.{title, url, source, score, snippet}. Fetch failures are surfaced as a non-fatalerror/warningsfield rather than crashing the tool.Why
Lets assistants answer "how do I…" (product) and "how do I build on Plane…" (API/self-host) questions by citing the official docs, instead of relying on stale model knowledge.
Implementation notes
plane_mcp/tools/docs.py, registered viatools/__init__.py(matches the existing per-domain pattern). No Plane auth — the docs are public._parse_llms_full,_parse_yaml_fields,_score_page,_make_snippet,_search) are split from I/O so the parser and ranker are unit-tested with no network.url: >-on API-reference pages), strips.htmlsuffixes for clean browsable URLs, and decodes HTML entities in titles/snippets.httpxas an explicit dependency (already transitive via fastmcp).Testing
tests/test_docs_search.py(parser, tokenizer, ranking, source filter, limits, error handling) — no network required.ruff checkandruff formatclean./api-reference/issue/add-issue/self-hosting/methods/docker-compose/core-concepts/cyclesSummary by CodeRabbit
New Features
Bug Fixes
Tests