Skip to content

amajorai/layer1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

layer1

A fully self-hostable RAG memory layer built in Rust. Combines local vector embeddings via llama.cpp, knowledge graphs, BM25 full-text search, and reranking into a single binary that runs on your machine.

Designed to plug into any AI agent through an OpenAI-compatible API, an MCP server for Claude Code and Cursor, or direct HTTP.


Features

  • Local embeddings using any GGUF model via llama.cpp (no cloud required)
  • Vector similarity search via sqlite-vec with float32 vectors
  • Knowledge graph storage using LightRAG-style entity and relationship extraction
  • Hybrid search: vector + BM25 with reciprocal rank fusion
  • Optional cross-encoder reranking
  • OpenAI-compatible /v1/chat/completions and /v1/models endpoints
  • MCP server (stdio) for Claude Code and Cursor
  • CLI for ingesting files, searching, and managing models
  • Model downloads from HuggingFace Hub (with or without a token)
  • Single SQLite database — no external services

Requirements

  • Rust 1.75 or newer
  • cmake (for llama.cpp compilation)
  • On Windows: Visual C++ Build Tools (MSVC)
  • On macOS: Xcode Command Line Tools (Metal acceleration automatic on Apple Silicon)
  • On Linux: gcc or clang

Installation

git clone <this-repo>
cd layer1
cargo build --release
cp layer1.toml.example layer1.toml

Add the binary to your PATH or invoke it as ./target/release/layer1.


Quick start

1. Download an embedding model

layer1 model pull nomic-ai/nomic-embed-text-v1.5-GGUF nomic-embed-text-v1.5.Q4_K_M.gguf

2. Configure

Copy layer1.toml.example to layer1.toml and set embedding_model to the downloaded path:

[server]
host = "127.0.0.1"
port = 3000
api_key = "your-secret-key"

[database]
path = "layer1.db"
embedding_dim = 768

[models]
embedding_model = "models/nomic-embed-text-v1.5.Q4_K_M.gguf"
n_gpu_layers = 0
models_dir = "models"

[rag]
chunk_size = 512
chunk_overlap = 64
top_k = 10
rerank_top_k = 5

3. Start the server

layer1 serve

4. Ingest content

# From a file
layer1 ingest README.md

# From stdin
echo "The Eiffel Tower is in Paris." | layer1 ingest -

# With metadata
layer1 ingest notes.txt --metadata '{"source": "personal"}'

5. Search

layer1 search "what is the Eiffel Tower"

6. Connect Claude Code or Cursor

layer1 init

This writes .claude/mcp.json and .cursor/mcp.json with the MCP server config, and .claude/skills/layer1.md for use as a Claude Code skill.


CLI reference

layer1 [--config <path>] <command>

Commands:
  serve                      Start the HTTP API server
  mcp                        Start MCP server on stdio
  model pull <repo> <file>   Download a GGUF model from HuggingFace
  model list                 List downloaded models
  model remove <name>        Delete a model file
  ingest <path|-|>           Ingest a file or stdin into memory
  search <query>             Semantic search over stored memory
  chat                       Interactive chat with RAG context
  init                       Generate MCP config files for agent clients

API reference

All endpoints except /health require the X-API-Key header (or Authorization: Bearer <key>).

Store content

POST /api/ingest
Content-Type: application/json
X-API-Key: <key>

{
  "content": "...",
  "metadata": { "source": "docs" }
}

Response:

{ "document_id": "uuid", "chunk_count": 4 }

Search

POST /api/search
Content-Type: application/json
X-API-Key: <key>

{
  "query": "...",
  "top_k": 5,
  "include_graph": false
}

Response:

{
  "results": [
    { "chunk_id": "...", "document_id": "...", "content": "...", "score": 0.91, "source": "hybrid" }
  ],
  "graph_context": []
}

Chat (OpenAI-compatible)

POST /v1/chat/completions
Content-Type: application/json
X-API-Key: <key>

{
  "model": "layer1",
  "messages": [{ "role": "user", "content": "What do you know about Paris?" }]
}

The server injects RAG context from memory into the system prompt automatically. Set generation_model in layer1.toml for AI-generated responses.

List models

GET /v1/models
X-API-Key: <key>

MCP server

Run the MCP server for Claude Code or Cursor:

layer1 mcp

The server communicates over stdio using JSON-RPC 2.0. It exposes four tools:

Tool Description
store_memory Store text with optional metadata
search_memory Semantic search over stored memories
list_memories List recent documents
delete_memory Remove a document by ID

Add to .claude/mcp.json:

{
  "mcpServers": {
    "layer1": {
      "command": "layer1",
      "args": ["--config", "layer1.toml", "mcp"]
    }
  }
}

Or run layer1 init to generate all config files automatically.


GPU acceleration

Set n_gpu_layers in layer1.toml:

  • 0 = CPU only (default)
  • -1 = all layers on GPU

On macOS with Apple Silicon, Metal is enabled automatically. On Linux with CUDA, build with:

LLAMA_CUDA=1 cargo build --release

Note on Windows and large models

Due to a known issue in llama-cpp-sys-2, GGUF models larger than 4 GB may fail to load on Windows with MSVC. Use models under 4 GB (Q4_K_S or smaller) or build with the MinGW toolchain as a workaround. See utilityai/llama-cpp-rs#951 for status.


Database schema

All data is stored in a single SQLite file:

Table Contents
documents Full source documents with metadata
chunks Text chunks with document references
chunk_vec Vector embeddings for chunks (sqlite-vec)
chunks_fts FTS5 full-text index for BM25 search
entities Named entities extracted from documents
entity_vec Vector embeddings for entities
relationships Entity relationships with strength scores
relation_vec Vector embeddings for relationships

License

MIT

About

Self-hostable RAG memory layer with local embeddings, knowledge graphs, and OpenAI-compatible API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages