layer1

A fully self-hostable RAG memory layer built in Rust. Combines local vector embeddings via llama.cpp, knowledge graphs, BM25 full-text search, and reranking into a single binary that runs on your machine.

Designed to plug into any AI agent through an OpenAI-compatible API, an MCP server for Claude Code and Cursor, or direct HTTP.

Features

Local embeddings using any GGUF model via llama.cpp (no cloud required)
Vector similarity search via sqlite-vec with float32 vectors
Knowledge graph storage using LightRAG-style entity and relationship extraction
Hybrid search: vector + BM25 with reciprocal rank fusion
Optional cross-encoder reranking
OpenAI-compatible /v1/chat/completions and /v1/models endpoints
MCP server (stdio) for Claude Code and Cursor
CLI for ingesting files, searching, and managing models
Model downloads from HuggingFace Hub (with or without a token)
Single SQLite database — no external services

Requirements

Rust 1.75 or newer
cmake (for llama.cpp compilation)
On Windows: Visual C++ Build Tools (MSVC)
On macOS: Xcode Command Line Tools (Metal acceleration automatic on Apple Silicon)
On Linux: gcc or clang

Installation

git clone <this-repo>
cd layer1
cargo build --release
cp layer1.toml.example layer1.toml

Add the binary to your PATH or invoke it as ./target/release/layer1.

Quick start

1. Download an embedding model

layer1 model pull nomic-ai/nomic-embed-text-v1.5-GGUF nomic-embed-text-v1.5.Q4_K_M.gguf

2. Configure

Copy layer1.toml.example to layer1.toml and set embedding_model to the downloaded path:

[server]
host = "127.0.0.1"
port = 3000
api_key = "your-secret-key"

[database]
path = "layer1.db"
embedding_dim = 768

[models]
embedding_model = "models/nomic-embed-text-v1.5.Q4_K_M.gguf"
n_gpu_layers = 0
models_dir = "models"

[rag]
chunk_size = 512
chunk_overlap = 64
top_k = 10
rerank_top_k = 5

3. Start the server

layer1 serve

4. Ingest content

# From a file
layer1 ingest README.md

# From stdin
echo "The Eiffel Tower is in Paris." | layer1 ingest -

# With metadata
layer1 ingest notes.txt --metadata '{"source": "personal"}'

5. Search

layer1 search "what is the Eiffel Tower"

6. Connect Claude Code or Cursor

layer1 init

This writes .claude/mcp.json and .cursor/mcp.json with the MCP server config, and .claude/skills/layer1.md for use as a Claude Code skill.

CLI reference

layer1 [--config <path>] <command>

Commands:
  serve                      Start the HTTP API server
  mcp                        Start MCP server on stdio
  model pull <repo> <file>   Download a GGUF model from HuggingFace
  model list                 List downloaded models
  model remove <name>        Delete a model file
  ingest <path|-|>           Ingest a file or stdin into memory
  search <query>             Semantic search over stored memory
  chat                       Interactive chat with RAG context
  init                       Generate MCP config files for agent clients

API reference

All endpoints except /health require the X-API-Key header (or Authorization: Bearer <key>).

Store content

POST /api/ingest
Content-Type: application/json
X-API-Key: <key>

{
  "content": "...",
  "metadata": { "source": "docs" }
}

Response:

{ "document_id": "uuid", "chunk_count": 4 }

Search

POST /api/search
Content-Type: application/json
X-API-Key: <key>

{
  "query": "...",
  "top_k": 5,
  "include_graph": false
}

Response:

{
  "results": [
    { "chunk_id": "...", "document_id": "...", "content": "...", "score": 0.91, "source": "hybrid" }
  ],
  "graph_context": []
}

Chat (OpenAI-compatible)

POST /v1/chat/completions
Content-Type: application/json
X-API-Key: <key>

{
  "model": "layer1",
  "messages": [{ "role": "user", "content": "What do you know about Paris?" }]
}

The server injects RAG context from memory into the system prompt automatically. Set generation_model in layer1.toml for AI-generated responses.

List models

GET /v1/models
X-API-Key: <key>

MCP server

Run the MCP server for Claude Code or Cursor:

layer1 mcp

The server communicates over stdio using JSON-RPC 2.0. It exposes four tools:

Tool	Description
`store_memory`	Store text with optional metadata
`search_memory`	Semantic search over stored memories
`list_memories`	List recent documents
`delete_memory`	Remove a document by ID

Add to .claude/mcp.json:

{
  "mcpServers": {
    "layer1": {
      "command": "layer1",
      "args": ["--config", "layer1.toml", "mcp"]
    }
  }
}

Or run layer1 init to generate all config files automatically.

GPU acceleration

Set n_gpu_layers in layer1.toml:

0 = CPU only (default)
-1 = all layers on GPU

On macOS with Apple Silicon, Metal is enabled automatically. On Linux with CUDA, build with:

LLAMA_CUDA=1 cargo build --release

Note on Windows and large models

Due to a known issue in llama-cpp-sys-2, GGUF models larger than 4 GB may fail to load on Windows with MSVC. Use models under 4 GB (Q4_K_S or smaller) or build with the MinGW toolchain as a workaround. See utilityai/llama-cpp-rs#951 for status.

Database schema

All data is stored in a single SQLite file:

Table	Contents
`documents`	Full source documents with metadata
`chunks`	Text chunks with document references
`chunk_vec`	Vector embeddings for chunks (sqlite-vec)
`chunks_fts`	FTS5 full-text index for BM25 search
`entities`	Named entities extracted from documents
`entity_vec`	Vector embeddings for entities
`relationships`	Entity relationships with strength scores
`relation_vec`	Vector embeddings for relationships

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude		.claude
.cursor		.cursor
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
layer1.toml.example		layer1.toml.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

layer1

Features

Requirements

Installation

Quick start

1. Download an embedding model

2. Configure

3. Start the server

4. Ingest content

5. Search

6. Connect Claude Code or Cursor

CLI reference

API reference

Store content

Search

Chat (OpenAI-compatible)

List models

MCP server

GPU acceleration

Note on Windows and large models

Database schema

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

layer1

Features

Requirements

Installation

Quick start

1. Download an embedding model

2. Configure

3. Start the server

4. Ingest content

5. Search

6. Connect Claude Code or Cursor

CLI reference

API reference

Store content

Search

Chat (OpenAI-compatible)

List models

MCP server

GPU acceleration

Note on Windows and large models

Database schema

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages