Skip to content

feat: add symbol definition/reference tracking via tags.scm queries #524

Description

@vitali87

Background

Tree-sitter's tags.scm query files define patterns for identifying symbol definitions (where classes, functions, methods are declared) and references (where they are used). These are the same queries that power GitHub's "Go to Definition" and "Find References" features.

Problem

Our current definition tracking is built from custom tree-sitter queries in language_spec.py and constants.py. While functional, this approach:

  • Misses some definition types (e.g., TypeScript type aliases, Rust trait implementations, Java interface implementations)
  • Doesn't track references at all — we only discover references through call resolution, missing type annotations, variable declarations, and inheritance clauses
  • Requires manual maintenance per language when grammars update

The upstream tags.scm files are community-maintained and already capture definition/reference pairs we're missing.

Approach

Vendor tags.scm files from official tree-sitter grammar repos for all 12 supported languages. These are already available in the pip packages for 9/12 languages; we supplement the remaining 3 (TypeScript, Scala, C#) from the grammar repos.

What tags.scm captures

Capture types with semantic meaning:

  • @definition.class — class/struct/enum declarations
  • @definition.function — function declarations
  • @definition.method — method declarations
  • @definition.interface — interface declarations (Java, TS, Go)
  • @reference.call — function/method call sites
  • @reference.class — class references (inheritance, instantiation)
  • @reference.implementation — interface implementation references
  • @name — the identifier name at each definition/reference site
  • @doc — associated documentation comments

Example from Java's tags.scm:

(class_declaration
  name: (identifier) @name) @definition.class

(interface_declaration
  name: (identifier) @name) @definition.interface

(type_list
  (type_identifier) @name) @reference.implementation

(superclass (type_identifier) @name) @reference.class

Implementation

  1. Create codebase_rag/queries/tags/ directory with .scm files per language
  2. Run tags queries alongside existing structural queries during parsing
  3. Cross-validate our definitions against @definition.* captures — log warnings for any definitions we miss
  4. Use @reference.* captures to create new relationship types or supplement existing CALLS/IMPORTS edges
  5. Use @doc captures to associate documentation comments with their definitions
  6. Add tests per language verifying tag capture completeness

Languages

All 12: Python, JavaScript, TypeScript, Rust, Java, C, C++, Go, Lua, Scala, PHP, C#

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions