Monday - Browser AI Chat

Run open-source AI models directly in your browser. No server, no install, 100% private.

Features

Zero Install — Pure browser experience, no downloads needed
Browser-Native Inference — Models run locally via WebGPU + WASM using Web-LLM
23 Pre-configured Models — Qwen 2.5/3/3.5, SmolLM2, Gemma 2/3, Phi 3.5/4, Llama 3.2, DeepSeek R1, and more
Streaming Output — Token-by-token real-time response
Chat History — Persistent multi-session conversations via IndexedDB
Changelog — In-app version history with expandable release details
Usage Statistics — Dashboard with daily charts, per-model breakdown, and provider analytics
Model Comparison — Side-by-side generation from two models with real-time token stats
Model Benchmark — Built-in benchmark tool to measure tokens/sec and latency
Custom Model Import — Load custom MLC-compiled models from any HuggingFace URL
Download Resume — Resume interrupted model downloads from where you left off
Session Search — Search conversations by title with date filtering
Command Palette — Quick navigation with ⌘K
Prompt Templates & Personas — 8 built-in personas + custom persona creation
Message Actions — Edit and regenerate user messages inline
Generation Parameters — Per-session temperature, top-p, max tokens sliders
System Prompts — Customizable per-session system prompts
Token Counter — Real-time tokens/sec and total token usage
Model Cache Manager — View and delete cached models
Recent Models — Quick access to recently used models
Recommended Models — Top models based on your usage history
Storage Quota — Monitor browser storage usage
Markdown Rendering — Code highlighting, LaTeX math, GFM tables
Chat Export — Export conversations as Markdown
BorderBeam UI — Animated border effects with ocean/colorful/mono variants
Theme Toggle — Light / Dark / System with auto-detection
Mobile Responsive — Sidebar overlay, auto-close, safe-area support
PWA Ready — Web app manifest, apple-touch-icon
100% Private — Nothing leaves your browser

Architecture

High-Level System Architecture

graph TB
    subgraph Browser["🌐 Browser (Client-Side Only)"]
        UI["React UI<br/>Vite + TypeScript"]
        Engine["Web-LLM Engine<br/>WebGPU / WASM"]
        IDB["IndexedDB<br/>Chat Persistence"]
        Cache["Browser Cache<br/>Model Weights"]
    end

    subgraph External["☁️ External (Read-Only)"]
        HF["HuggingFace CDN<br/>MLC Model Registry"]
        GHP["GitHub Pages<br/>Static Hosting"]
    end

    User(("👤 User")) --> UI
    UI -->|"streamChat()"| Engine
    Engine -->|"Token Stream"| UI
    UI -->|"saveSessions()"| IDB
    IDB -->|"loadSessions()"| UI
    Engine <-->|"Download Once"| HF
    Engine -->|"Cache Weights"| Cache
    GHP -->|"Serve SPA"| Browser

    style Browser fill:#1a1a2e,stroke:#a78bfa,color:#e5e5e5
    style External fill:#0d1117,stroke:#444,color:#999
    style Engine fill:#7c3aed,stroke:#a78bfa,color:#fff

Routing Architecture

Monday uses a zero-dependency URL routing system built on the HTML5 History API — no React Router, no hash fragments.

All 14 named views and their paths (defined in src/App.tsx):

View key	URL path
`chat`	`/monday/`
`models`	`/monday/models`
`changelog`	`/monday/changelog`
`cache`	`/monday/cache`
`stats`	`/monday/stats`
`comparison`	`/monday/comparison`
`benchmark`	`/monday/benchmark`
`custom-models`	`/monday/custom-models`
`persona-marketplace`	`/monday/persona-marketplace`
`knowledge`	`/monday/knowledge`
`plugins`	`/monday/plugins`
`mcp-servers`	`/monday/mcp-servers`
`webdav`	`/monday/webdav`
`memory`	`/monday/memory`

How it works:

flowchart LR
    URL["URL\n/monday/…"] -->|popstate| VFP["viewFromPath()\nURL → View enum"]
    VFP --> State["view state\nReact useState"]
    State -->|useEffect| PS["history.pushState"]
    PS --> URL

    subgraph GH["GitHub Pages compat"]
        F["public/404.html\nsaves path → sessionStorage"]
        R["Redirect → /monday/"]
        A["App init reads sessionStorage\nhistory.replaceState"]
        F --> R --> A
    end

    style State fill:#7c3aed,stroke:#a78bfa,color:#fff
    style GH fill:#0d1117,stroke:#444,color:#999

Key behaviours:

Calling setView(v) triggers a useEffect that does history.pushState to the mapped URL — the URL bar updates instantly without a page reload.
popstate events (browser back / forward) call viewFromPath(pathname) to resolve the URL back into a View and update React state.
GitHub Pages 404 compatibility: public/404.html captures the requested path in sessionStorage and redirects to /monday/. On first render App.tsx reads it back and calls history.replaceState to restore the original URL before the SPA mounts.

Rule for adding a new view:

Add the key + path to VIEW_PATH in App.tsx.

Add a view === 'new-view' render branch in the JSX return.

Add a navigation callback to useKeyboardShortcuts and a menu item in Sidebar.

Component Architecture

graph TD
    App["App.tsx<br/>URL Router + Global State"]
    App --> Sidebar["Sidebar<br/>Session List + Nav"]
    App --> Header["Header<br/>Model Badge + Theme"]

    subgraph Views["Routed Views (view state)"]
        V_chat["chat\n ChatLayout"]
        V_models["models\n ModelSelector"]
        V_knowledge["knowledge\n KnowledgePanel"]
        V_plugins["plugins\n PluginManager"]
        V_mcp["mcp-servers\n McpServerManager"]
        V_memory["memory\n MemoryPanel"]
        V_webdav["webdav\n WebDAVSettings"]
        V_stats["stats\n ModelStats"]
        V_cmp["comparison\n ModelComparison"]
        V_bench["benchmark\n ModelBenchmark"]
        V_persona["persona-marketplace\n PersonaMarketplace"]
        V_custom["custom-models\n CustomModelImport"]
        V_cache["cache\n (cache manager)"]
        V_cl["changelog\n Changelog"]
    end

    App --> Views

    Header --> TT["ThemeToggle<br/>Light/Dark/System"]
    Header --> WG["WebGPUCheck"]
    V_chat --> ML["MessageList"]
    V_chat --> CI["ChatInput<br/>BorderBeam textarea"]

    subgraph Hooks["Custom Hooks"]
        useModel["useModel<br/>Load/Unload/Progress"]
        useChat["useChat<br/>Sessions/Messages/Stream"]
        useTheme["useTheme<br/>Light/Dark/System"]
        useKnowledge["useKnowledge / useKnowledgeBases"]
        useVectorStore["useVectorStore<br/>IndexedDB vectors"]
        useEmbedding["useEmbeddingModel<br/>GTE-small MLC"]
        useMcp["useMcpServers"]
    end

    subgraph Lib["Core Library"]
        engine["engine.ts<br/>Web-LLM Singleton"]
        models["models.ts<br/>Model Registry"]
        storage["storage.ts<br/>IndexedDB Ops"]
        changelog["changelog.ts<br/>Version Data"]
    end

    App --> Hooks
    Hooks --> Lib
    engine -->|"CreateMLCEngine"| WEBLLM["@mlc-ai/web-llm"]

    style App fill:#7c3aed,stroke:#a78bfa,color:#fff
    style Views fill:#1a1a2e,stroke:#a78bfa,color:#e5e5e5
    style Hooks fill:#1e3a5f,stroke:#3b82f6,color:#e5e5e5
    style Lib fill:#1a3328,stroke:#22c55e,color:#e5e5e5

Data Flow: Chat Message Lifecycle

sequenceDiagram
    participant User
    participant ChatInput
    participant useChat
    participant engine.ts
    participant WebLLM
    participant IndexedDB

    User->>ChatInput: Type message + Enter
    ChatInput->>useChat: sendMessage(content)
    useChat->>useChat: Create user msg + empty assistant msg
    useChat->>engine.ts: streamChat(history)
    engine.ts->>WebLLM: chat.completions.create(stream:true)

    loop Token Streaming
        WebLLM-->>engine.ts: yield delta token
        engine.ts-->>useChat: yield token
        useChat-->>useChat: Append to assistant msg
        useChat-->>ChatInput: Re-render (streaming)
    end

    useChat->>useChat: Finalize msg, generate title
    useChat->>IndexedDB: saveSessions(updated)
    useChat-->>User: Complete response displayed

Model Loading Flow

sequenceDiagram
    participant User
    participant ModelSelector
    participant useModel
    participant engine.ts
    participant WebLLM
    participant HuggingFace as HF CDN

    User->>ModelSelector: Click model card
    ModelSelector->>useModel: load(modelId)
    useModel->>useModel: setState(downloading, 0%)

    useModel->>engine.ts: loadModel(modelId, onProgress)
    engine.ts->>WebLLM: CreateMLCEngine(modelId)
    WebLLM->>HuggingFace: Fetch model weights (WASM/WebGPU)

    loop Download Progress
        HuggingFace-->>WebLLM: Chunk data
        WebLLM-->>engine.ts: InitProgressReport
        engine.ts-->>useModel: progress callback
        useModel-->>ModelSelector: Update progress bar
    end

    WebLLM-->>engine.ts: Engine ready
    engine.ts-->>useModel: resolve
    useModel->>useModel: setState(ready, 100%)
    useModel-->>User: Model badge shown ✓

Tech Stack

Layer	Technology
Framework	Vite 8 + React 19 + TypeScript 6
AI Runtime	@mlc-ai/web-llm (WebGPU + WASM)
UI Effects	border-beam
Persistence	IndexedDB (sessions, messages)
Deployment	GitHub Pages via GitHub Actions
Build	Vite, ESNext target

Supported Models

Model	Parameters	Size	Provider
Qwen 3 0.6B ⭐	0.6B	~400 MB	Alibaba
Qwen 3 1.7B	1.7B	~1 GB	Alibaba
Qwen 3 4B	4B	~2.5 GB	Alibaba
Qwen 3.5 0.8B ⭐	0.8B	~500 MB	Alibaba
Qwen 3.5 2B	2B	~1.2 GB	Alibaba
Qwen 2.5 0.5B ⭐	0.5B	~350 MB	Alibaba
Qwen 2.5 1.5B	1.5B	~900 MB	Alibaba
Qwen 2.5 3B	3B	~1.8 GB	Alibaba
Qwen 2.5 Coder 1.5B	1.5B	~900 MB	Alibaba
SmolLM2 360M	360M	~200 MB	HuggingFace
SmolLM2 1.7B	1.7B	~1 GB	HuggingFace
Gemma 2 2B	2B	~1.3 GB	Google
Gemma 3 4B	4B	~2.5 GB	Google
Gemma 3 1B	1B	~700 MB	Google
Phi 3.5 Mini	3.8B	~2 GB	Microsoft
Phi 4 Mini ⭐	3.8B	~2.2 GB	Microsoft
DeepSeek R1 Distill Qwen 1.5B ⭐	1.5B	~1 GB	DeepSeek
Llama 3.2 1B	1B	~700 MB	Meta
Llama 3.2 3B	3B	~1.8 GB	Meta
TinyLlama 1.1B	1.1B	~600 MB	Community
StableLM 2 Zephyr 1.6B	1.6B	~950 MB	Stability AI
InternLM 2.5 1.8B	1.8B	~1.1 GB	Shanghai AI Lab
OLMo 1B	1B	~600 MB	Allen Institute

⭐ = Recommended for most users

Competitive Analysis

Roadmap informed by deep analysis of these leading AI chat platforms:

Product	Stars	Key Differentiator	Monday Relevance
OpenClaw	364k	Personal always-on AI assistant: multi-channel (WhatsApp/Telegram/Slack/Discord/…), SOUL.md identity, AgentSkills/SKILL.md ecosystem, ClawHub registry (52.7k skills, 12M installs), Skill Workshop AI, per-agent allowlists	Skills system (SKILL.md spec), skill marketplace, SOUL.md persona persistence, self-improving agent memory
Open WebUI	132k	Full-featured self-hosted AI platform: RAG, pipelines, MCP, RBAC, voice/video, image gen	Feature-complete reference for chat UX, RAG, tools
NextChat	88k	Lightweight cross-platform AI client: Vercel deploy, MCP, masks, artifacts, Tauri desktop	Lightweight UX, prompt templates, artifacts rendering
LobeHub	75k	Agent-as-unit-of-work platform: 10k+ plugins, agent groups, personal memory, TTS/STT	Agent system, plugin ecosystem, memory architecture
Jan	42k	Offline desktop ChatGPT: local LLMs via llama.cpp, custom assistants, OpenAI-compatible API	Offline-first philosophy, model management, MCP integration
GPT-Runner	379	AI presets for code: conversations with code files, IDE integration, version-controlled prompts	Preset system, project-scoped AI configuration
Claude Code	118k	Terminal coding agent: deep codebase understanding, multi-step agentic task execution, bash/git/test tools, CLAUDE.md project config, plugins directory, `@claude` GitHub tagging, `computer_use` tool (screenshot→observe→action loop in sandboxed VM)	Agent mode (v0.30), plugin system (v0.27), task brief (CLAUDE.md equivalent), computer-use loop (v1.3)
browser-use	90.4k	LLM-controlled browser automation: Playwright-backed agent, click/type/scroll/navigate/screenshot/extract-text primitives, skills directory, AGENTS.md + CLAUDE.md conventions, CLI, cloud hosting, 100-task real-world benchmark	Browser-use agent (v1.3): action primitives, sandboxed iframe execution loop, DOM-state context, screenshot observation
Playwright MCP	31.4k	MCP server for browser automation: navigate/click/fill/screenshot/DOM tools via accessibility tree (no vision model needed); vision + coordinate-based modes opt-in; used by VS Code, Cursor, Claude Desktop, Codex, Copilot	Playwright MCP bridge (v1.3): connect Monday's v0.27 MCP client to `@playwright/mcp` for full external-browser control

Roadmap

North Star (immutable): A local-first, browser-native AI workstation. WebGPU inference + optional remote providers, with first-class memory, tools and offline capability — all running entirely in the user's browser.

Three non-negotiable axes every release must satisfy:

Local-first — every feature works with WebGPU + IndexedDB only; cloud providers are an option, never a requirement.

Phase progression — releases ship the earliest unreleased version in ### Versioned task breakdown end-to-end. No skipping versions, no scope outside the listed checkboxes.

Release gate — a version is "done" only when its release gate is green. Trivial built-ins or polish do not unlock the next version.

Versioned task breakdown

The autonomous Cron picks the first unchecked, unblocked item in the earliest unreleased version and ships it end-to-end (code + build green

visible in UI + entry in CHANGELOG). It never invents scope outside this list, and never skips a version. Past versions remain documented as a historical record below.

v0.25 — Knowledge & RAG (storage layer) (current target)

Phase 5 of the legacy plan, split for scope safety. RAG is the highest-value unmet feature in the product.

Document upload — Upload PDFs / TXT / MD files into a "Knowledge" panel (PDF parsing via pdfjs-dist)
Client-side chunking — Split documents into ~500-token chunks in-browser (no server)
Browser vector store — IndexedDB-backed vector store with cosine similarity, schema migration registered in storage.ts
Knowledge bases — Organize documents into named collections; attach a collection to a session

Release gate: a user uploads a 5-page PDF, sees chunks indexed, and a search box returns the top-K matching chunks (no LLM yet — that's v0.26).

Released: 2026-04-25

v0.26 — RAG (retrieval + citation)

Embedding model — Run a small embedding model via Web-LLM (e.g. gte-small MLC build) and persist embeddings
Semantic search — On send, query the active knowledge base and inject top-K chunks into the system prompt
Citation display — Show which chunks were used per assistant message, with click-to-open
Citation persistence — Citations survive page reload (stored alongside message in IndexedDB)

Release gate: a question answered using a chunk shows a citation that opens to the exact span of the source document; reload preserves it.

Released: 2026-04-26

v0.27 — Tools, Function calling, MCP

Phase 6 advanced — the only sanctioned tools work. Net-new built-in mini-tools (calculator / clock / unit converter / JSON formatter / one-shot web-search button / standalone formatter) are out of scope: they distract from the function-calling / plugin / MCP work that actually lets users plug in any tool. Mini-tools, if at all, ship later as plugins through the system below.

Function calling — Parse model tool-call outputs (OpenAI-style tool_calls JSON) and dispatch to in-browser functions
Plugin system — Load third-party tool plugins from URL (JSON manifest declaring name / description / inputSchema / handlerUrl)
MCP client — Connect to an MCP server (WebSocket transport) and expose its tools to the model
Tool call inspector — A panel that shows the request / response / latency of every tool call in a session

Release gate: a user installs one external plugin from URL or connects to one MCP server, the model invokes a tool from it, and the inspector shows the full request / response.

Released: 2026-04-26

v0.28 — Collaboration & Sharing

Share conversations — Generate a shareable static HTML export (no server)
Import/export — Full data import / export (sessions, personas, settings, knowledge bases) as a single .monday zip
WebDAV sync — Cross-device sync via user-supplied WebDAV server
Shared personas — Publish a persona to a static community registry (curated JSON file in the repo)
Conversation forking — Branch a session at any message; branches are siblings, navigable in the sidebar

Release gate: round-trip import → export → re-import preserves every session, persona and knowledge base byte-for-byte.

Released: 2026-04-26

v0.29 — Desktop, PWA polish & shortcuts

Update prompt — Banner when a new service worker is installed
Offline indicator — Header chip when offline; gracefully disable cloud-only features
Background notifications — Notify when a long generation completes while the tab is hidden (uses existing useNotifications)
Desktop app — Tauri wrapper that targets macOS / Windows / Linux
Keyboard shortcuts overlay — ? opens a list of every shortcut (Cmd+K / Cmd+N / Cmd+⇧S / Cmd+E …); shortcuts also documented in the README
Multi-window — Open a conversation in a separate browser window / Tauri window with shared IndexedDB

Release gate: a Tauri build runs on macOS with full chat + RAG + tools functionality; offline mode degrades gracefully.

v0.30 — Agent mode & analytics

Multi-turn memory — Auto-summarize early turns when the context window is exceeded; summaries are visible and editable
Agent mode — Multi-step task execution with tool use (an outer planner loop on top of v0.27 function calling)
Model chaining — Pipeline: fast model drafts → large model refines, configurable per persona
Batch generation — Generate N responses in parallel and pick the best
Usage analytics — Local-only dashboard: model usage, tokens consumed, average tps, sessions per day
i18n — Multi-language interface (English, 中文, 日本語) with language picker in settings
Accessibility — Screen-reader landmarks, keyboard-only navigation, high-contrast theme

Release gate: a documented agent-mode demo solves a 3-step task (search → summarize → save) end-to-end with zero manual intervention.

Released: 2026-04-27

v0.31 — Code Arena / Showdown Mode

A richer evolution of the existing Model Comparison view, inspired by WebDev Arena, Design Arena and the indie "Grass Field challenge" rigs that show up in Twitter dual-pane screenshots: same prompt → two models → live HTML/canvas preview → shareable recording. Net-new vs. v0.2's plain text-only comparison.

Dual artifact panes — Side-by-side terminal-style cards with provider badge, model name, status (pending / streaming / done) and generation duration in seconds
Sandboxed iframe preview — Each pane mounts the streamed HTML/CSS/JS into a sandbox="allow-scripts" iframe, refreshed on every chunk (debounced) and on a manual ↻ Run button
Code ↔ Preview tabs — Per-pane toggle between rendered preview and source view, with a Copy button on each
Synchronized scroll — Code view in both panes scrolls in lockstep (line-aligned) to make diffs obvious
Challenge prompt library — Curated presets (Grass Field, Solar System, Pelican on a Bicycle, Tetris, Snake, Bouncing Balls, Particle System, CSS Loader Gallery), one-click load into the arena
Recording & video export — MediaRecorder captures both iframes as a synchronized timelapse .webm (default 30 fps, configurable), with a small "@username" watermark from settings
PNG share card — Export a single PNG with both final previews, model names, durations and watermark — sized for Twitter (16:9)
Verdict & local leaderboard — Team A / Tie / Team B voting UI; results persisted in IndexedDB and aggregated into a per-model win/tie/loss table (purely local, no upload)

Release gate: a user picks two models, loads the "Grass Field" preset, hits Send, sees both iframes animate side-by-side, exports a .webm with watermark, votes a winner, and the leaderboard updates.

Released: 2026-04-29

v1.0 — External LLM Providers & Web Search (stable)

The "1.0" promise: anything saved in v1.0 keeps working until v2.0.

OpenAI-compatible API — Configure any OpenAI-compatible endpoint (custom base URL + API key, stored encrypted in IndexedDB)
Ollama integration — Connect to a local Ollama server (http://localhost:11434) with model auto-discovery
LM Studio — Connect to LM Studio's local OpenAI-compatible server
llama.cpp server — Connect to llama.cpp --server HTTP mode
vLLM — Connect to a vLLM inference endpoint
DeepSeek API — First-class DeepSeek cloud provider (chat + reasoner models)
Provider switcher — Per-session toggle between WebGPU local inference and external API providers
SearXNG integration — Web search via a user-supplied SearXNG URL
Stable storage schema v1 — Migration registry frozen; future migrations must add, not break, fields

Release gate: a 24-hour soak test (1 hour with each provider) passes; the storage migration test from v0.25 → v1.0 round-trips without loss.

Released: 2026-04-30

v1.1 — Skills System

Inspired by OpenClaw's AgentSkills/SKILL.md ecosystem and ClawHub (52.7k tools, 12M downloads). Skills sit between personas (identity) and plugins (tools): a skill is a structured capability pack that teaches the model how to behave in a specialized domain — e.g. "Python Debugger", "Technical Writer", "SQL Analyst". Multiple skills can be stacked in one session.

Persona = who the AI is. Plugin = what tools the AI has. Skill = what the AI knows how to do (domain instructions, workflow steps, required-plugin declarations).

Skill format — Skill spec stored in IndexedDB: name, description, instructions (markdown injected into system prompt), requiredPlugins (list of plugin URLs/IDs), version, tags, icon
Skill composer — Per-session skill panel: attach 1–N skills alongside a persona; active skills shown as chips in the session header; skill instructions appended to the system prompt before each turn
Skill registry — Community skill registry (curated JSON file in the repo, like persona registry) with 20+ launch skills across categories: Coding, Writing, Research, Data, Language, Creative
Skill builder UI — In-app skill editor: name, description, tag picker, markdown instructions with live token-count estimate, required-plugin picker, export as .monday-skill JSON
Skill + plugin binding — A skill can declare required plugins by URL/ID; installing a skill from the registry auto-prompts to install any missing plugins (same flow as v0.27 plugin install)
SOUL.md equivalent — "Soul" tab in the persona editor: a persistent cross-session identity prompt that survives /new and session resets; stored in IndexedDB alongside the persona; separate from the per-session system prompt
Skill marketplace UI — Browse/search/install from the community registry; show tags, install count (local-only counter), author; one-click install
Skill hot-reload — Changes to an active skill take effect on the next message send (no session restart required)

Release gate: a user installs a "Python Debugger" skill from the registry, attaches it to a new session alongside a persona, sends a debugging question, and the model follows the skill's specialized workflow; the skill persists on page reload; the session header shows the active skill chip.

Released: 2026-05-03

v1.2 — Self-Improving Agent & Persistent Memory

Inspired by the top-trending ClawHub skills: self-improving-agent (411k downloads), ontology typed memory graph (171k downloads), and self-improving + proactive agent (174k downloads). All state is local-only — nothing leaves IndexedDB.

Persistent memory store — Cross-session key-value memory backed by IndexedDB; the model can read memories at session start and write new ones during the conversation; memories panel shows all entries with edit/delete
Memory namespaces — Memories scoped to three levels: global (all sessions), per-persona, per-skill; the active session inherits the union of applicable namespaces
Correction capture — When a user edits or regenerates a message, optionally record the correction as a named memory entry ("Prefer concise answers", "Always use TypeScript strict mode"); visible in the memories panel
Ontology store — Typed entity graph: Person, Project, Task, Event, Document; entities have properties + relationships; browsable/editable in a side panel; injected as a compact context block when relevant entities are mentioned
Session compaction with learning — When compacting long sessions (v0.30 multi-turn memory), extract preference signals and entity mentions into the memory store, not just a plain summary; user reviews before committing
Skill Workshop (browser edition) — After a session ends, the model proposes skill refinements based on corrections, regenerations, and user edits; proposals shown in a diff view; user approves → saved to the relevant skill in IndexedDB
Memory-aware personas — A persona can declare which memory namespaces it reads on activation (e.g. "global" + "per:this-persona"); persona editor shows a memory preview panel

Release gate: after 3 sessions with a persona, the memory panel shows ≥5 automatically captured preferences; a Skill Workshop proposal is generated, approved, and the next session reflects the updated skill instructions.

Released: 2026-05-03

v1.3 — Browser-Use & Computer-Use (In-Browser Agent Loop)

Inspired by Claude Code's computer_use tool (screenshot → observe → action loop in a sandboxed VM), browser-use (90.4k ⭐ — LLM-controlled Playwright agent with skills directory and action primitives), Playwright MCP (31.4k ⭐ — accessibility-tree MCP server used by VS Code / Cursor / Codex), and Codex CLI's sandboxed bash/file/edit execution model. Extends the v0.30 agent loop and v0.27 MCP client into a full browser-use and in-browser computer-use system.

Three execution tiers of increasing capability:

Tier 1 — Sandboxed iframe agent: model generates HTML/CSS/JS → renders in a sandbox="allow-scripts" iframe → html2canvas screenshot → model observes → next action. 100% in-browser, zero external dependencies.
Tier 2 — DOM-state computer-use: serialize the active iframe's accessibility tree to compact JSON and inject as a context block; model issues action commands (click, type, scroll, navigate); dispatcher translates to DOM events. Inspired by Playwright MCP's accessibility-tree approach — no vision model required.
Tier 3 — Playwright MCP bridge: connect Monday's v0.27 MCP client to a locally-running @playwright/mcp server; full external-browser control (navigate real URLs, fill forms, run tests) with model in the loop; every action logged in the existing tool-call inspector.
Agent action primitives — navigate, click, type, scroll, extract-text, take-screenshot, read-dom; each is a named MCP-style tool callable by the model via the v0.27 function-calling layer
Sandboxed iframe execution loop (Tier 1) — generate → render in sandbox="allow-scripts" iframe → html2canvas screenshot → attach as image to next LLM call → iterate; debounced auto-refresh + manual ↻ Run; reuses the iframe infra planned for v0.31 Code Arena
DOM-state capture (Tier 2) — serialize active iframe's accessibility tree (ARIA roles, labels, input states) to compact JSON injected into context before each model turn; depth + node-count budget to stay token-safe
Vision mode (Tier 1/2) — OffscreenCanvas / html2canvas screenshot attached as base64 image in the next LLM call; requires a multimodal model (e.g. Qwen-VL); falls back to DOM-state mode for non-vision models automatically
Playwright MCP bridge (Tier 3) — one-click connect in the MCP panel; Monday auto-discovers @playwright/mcp if already configured; domain allowlist + blocked-origins enforced per task brief
Task brief (AGENTS.md / CLAUDE.md equivalent) — per-task markdown config declaring goal, allowed domains, step budget, and stop criteria; stored in IndexedDB; shown as a collapsible header above the agent thread
Agent audit trail — chronological log of every action + observation + screenshot thumbnail; collapsible per step inside the chat thread; inspired by Codex CLI's terminal-log citation model and Codex Web's task-delegation audit view
Async task queue — "delegate and come back" UI inspired by Codex Web: submit a browser task, minimize the panel, get notified via the v0.29 background notification system when the agent finishes or needs human input
Sandbox security model — Tier 1: sandbox="allow-scripts" only (no allow-same-origin); Tier 3: domain allowlist + --blocked-origins forwarded to Playwright MCP; credentials redacted in audit trail logs (mirrors browser-use's fill() debug-log redaction practice)

Release gate: a user opens the agent panel, gives the task "fill in the sandboxed form and click Submit", the agent executes ≥5 actions (screenshot → click → type → submit → screenshot), the audit trail shows every step with thumbnails, the async task queue marks it done and triggers a notification, and the final iframe state is visible in the panel.

Released: 2026-05-12

v1.4 — Persona Marketplace & Image Input

Extending the persona system (v1.1) with community discovery and multimodal input.

Persona marketplace browsing — Browse community personas from the curated registry (already exists as PERSONA_REGISTRY); add search/filter by category, sort by install count; one-click install to the local persona store; shows persona preview (system prompt snippet, params, soul) before installing
Image input — Paste or drop an image into the chat input; for vision-capable models, the image is attached as a base64 data URL in the next LLM call; non-vision models show a graceful "vision not available" message ✅
Full PWA — Service worker with cache-first strategy for app shell + model weights; offline fallback page; install banner on repeat visits from desktop ✅

Release gate: a user browses the persona marketplace, installs a new persona, switches to it in a session, and the persona's soul and system prompt are active; a user pastes an image into chat and the vision model processes it.

Released: 2026-05-13

v1.5 — Voice & TTS — Multimodal I/O

Voice input and text-to-speech output for hands-free interaction, inspired by Open WebUI's voice features and NextChat's voice support.

Voice input — Browser Speech Recognition API for voice-to-text in the chat input; real-time transcription with interim results shown as placeholder text; stop button to end recording; automatic send on silence detection (configurable timeout) ✅
TTS output — Web Speech API text-to-speech for assistant responses; per-message play/pause/stop controls; voice selector (if available); auto-play toggle in settings; graceful fallback message when TTS is not supported ✅

Release gate: a user speaks into the microphone, sees real-time transcription, and the text is sent as a message; a user plays TTS on an assistant message and the browser speaks the response.

Released: 2026-05-13

v1.6 — Context Injection

Allow users to attach reusable text and code snippets to any session. Snippets are injected into the system prompt before each turn, giving the model persistent context without requiring full RAG.

Context library — Create, name, and organize text/code snippets; each snippet has a title, content (markdown), and optional category tag
Session context attachment — Attach one or more snippets to a session; attached snippets appear as a collapsible context block in the chat header ✅
Context injection — Attached snippets are prepended to the system prompt before each turn; context is visible in a "Context" panel alongside the message thread
Quick context — One-click context templates (e.g. "Project README", "API Reference", "Coding Standards") loaded from a built-in catalog ✅
Context search — Search the snippet library by title and content; filter by category

Release gate: a user creates a snippet, attaches it to a session, and the model's response reflects knowledge of the snippet content.

Released: 2026-05-13

Cross-cutting standing rules

These apply to every version and are enforced by the cron:

No "miscellaneous mini-tool" releases. A built-in tool that takes <1 day to implement (calculator, clock, formatter, converter, one-shot web-search button) does not count as a version and must not be added directly. Such utilities ship later as first-class plugins via the v0.27 plugin / MCP system, not as bespoke React components.
HEARTBEAT.md cites the current target version + the exact checkbox(es) in flight. The Next Steps list is taken verbatim from this file, not invented.
Local-first invariant — every new feature must work with the default WebGPU + IndexedDB stack; remote providers are additive.
Storage schema is versioned — any IndexedDB schema change ships with a forward migration in src/lib/storage.ts.
No skipping versions — if v0.25 is unfinished, work on v0.25 only. If every checkbox in the current version is blocked, the cron must spend the slot on tests, docs, refactors or accessibility for that version, not on a later version or on net-new mini-tools.

Historical phases (for reference)

Phases 1–3, Phase 0.8 and parts of Phases 4 / 6 shipped in v0.2 → v0.21. They remain documented below as a record but are not authoritative for future work — the ### Versioned task breakdown above is.

Note: v0.22–v0.24 (Calculator / Web Search / Unit Converter / JSON Formatter / Current Time) were rolled back on 2026-04-25 because they bypassed this Roadmap. Those features will return only as plugins via v0.27.

Phase 1 — Core Chat Enhancement (v0.2.x)

Bring chat to feature parity with basic ChatGPT UX

Markdown rendering — Render assistant responses with proper Markdown, code blocks, syntax highlighting
Code copy button — One-click copy for code blocks
LaTeX support — Math equation rendering with KaTeX
System prompt — Customizable system prompt per session
Generation params — Temperature, top_p, max_tokens sliders
Auto-scroll control — Pause auto-scroll when user scrolls up
Chat export — Export conversations as Markdown/JSON
Token counter — Display tokens/sec and total token usage
Message actions — Copy, regenerate, edit user messages

Phase 2 — Model Management (v0.3.x)

Rich model lifecycle and expanded model support

Model cache manager — View/delete cached models, show disk usage
More models — Add Llama 3.2 1B/3B, DeepSeek-R1-Distill, Mistral 7B, Stable Code 3B
Model benchmarks — Auto-run speed benchmark on load, show tokens/sec
Custom model import — Load custom MLC-compiled models from URL
Model comparison — Side-by-side generation from two models
Download resume — Resume interrupted model downloads with progress persistence
Storage quota — Show browser storage used vs available

Phase 3 — Prompt Templates & Personas (v0.7.x)

Inspired by NextChat masks, GPT-Runner presets, LobeHub agents

Prompt templates — Pre-built conversation starters (coding assistant, translator, tutor, etc.)
Custom personas — Create/save/share AI personas with system prompts + params
Persona marketplace — Browse community-shared personas (static JSON registry)
Quick prompts — Slash commands (/translate, /code, /explain) in chat input
Context injection — Attach text/code snippets as context before sending

Phase 4 — Multimodal & Rich Input (v0.5.x)

Add vision and file capabilities as models support them

Image input — Paste/upload images for vision models (when WebGPU vision models available)
File upload — Attach text files as conversation context
Drag & drop — Drag files directly into chat
Clipboard paste — Intelligent paste handling (images, code, rich text)
Voice input — Browser Speech Recognition API for voice-to-text
TTS output — Read assistant responses aloud via Web Speech API

Phase 5 — Knowledge & RAG (v0.6.x)

Local-first retrieval augmented generation, inspired by Open WebUI RAG

Document upload — Upload PDFs, TXT, MD files
Client-side chunking — Split documents into chunks in-browser
Browser vector store — IndexedDB-based vector storage
Embedding model — Run small embedding model via Web-LLM
Semantic search — Query uploaded documents before sending to LLM
Citation display — Show which document chunks were used in response
Knowledge bases — Organize documents into named collections

Phase 6 — Tools & Plugins (v0.7.x)

Function calling and tool use, inspired by LobeHub plugins and Open WebUI tools

Function calling — Parse model tool-call outputs and execute browser-side functions
Built-in tools — Calculator, current time, unit converter, JSON formatter
Web search — Browser-side web search integration (via public APIs)
Code execution — Sandboxed JavaScript execution in iframe
Artifacts — Render generated HTML/SVG/Mermaid in preview panel (like NextChat artifacts)
Plugin system — Load third-party tool plugins from URL (JSON manifest)
MCP client — Model Context Protocol support for external tool servers

Phase 0.8 — Personalization & Discovery (v0.8.x)

Personalized experience and easier conversation discovery

Model usage tracking — Automatically track which models you use most
Recommended models — Top 3 most-used models displayed in Model Selector
Reset recommendations — Clear usage history to reset model recommendations
Session search — Search conversations by title in the sidebar
Date filtering — Filter sessions by Today, Yesterday, This Week, This Month
Model usage stats — Visual chart of model usage frequency
Recent models — Quick access to recently used models

Phase 7 — Collaboration & Sharing (v0.9.x)

Social features inspired by LobeHub channels and Open WebUI community

Share conversations — Generate shareable link (static HTML export)
Import/export — Full data import/export (sessions, personas, settings)
WebDAV sync — Sync data across devices via WebDAV (like NextChat)
Shared personas — Publish personas to community registry
Conversation forking — Branch a conversation at any message

Phase 8 — Desktop & PWA (v0.9.x)

Expand beyond browser tab, inspired by Jan desktop and NextChat Tauri

Full PWA — Offline-capable progressive web app with service worker
Install prompt — Smart install banner for mobile and desktop
Notifications — Background generation completion notifications
Desktop app — Tauri wrapper for native macOS/Windows/Linux
Keyboard shortcuts — Full keyboard navigation (Cmd+K, Cmd+N, etc.)
Multi-window — Open conversations in separate windows/tabs

Phase 9 — Advanced AI Features (v1.0.x)

Towards a complete local AI workstation

Multi-turn memory — Compress long conversations for extended context
Agent mode — Multi-step task execution with tool use
Model chaining — Pipeline: fast model drafts → large model refines
Batch generation — Generate multiple responses and pick best
A/B testing — Compare model outputs with user ratings
Usage analytics — Local analytics dashboard (model usage, tokens, sessions)
i18n — Multi-language interface (English, 中文, 日本語, etc.)
Accessibility — Screen reader support, keyboard navigation, high contrast

Phase 10 — External LLM Providers & Web Search (v1.1.x)

Connect to cloud and local AI servers alongside native WebGPU inference

OpenAI-compatible API — Configure any OpenAI-compatible endpoint (custom base URL + API key)
Ollama integration — Connect to a local Ollama server (http://localhost:11434)
LM Studio — Connect to LM Studio's built-in OpenAI-compatible local server
llama.cpp server — Connect to llama.cpp's HTTP server (--server mode)
vLLM — Connect to a vLLM inference server endpoint
DeepSeek API — First-class DeepSeek cloud API provider (chat + reasoner models)
Provider switcher — Toggle between WebGPU local inference and external API providers in-session
SearXNG integration — Web search via a self-hosted SearXNG instance URL
Web search tool — Inject search results as context before sending to the model

Keyboard Shortcuts

Shortcut	Action
`⌘K`	Toggle Command Palette
`⌘N`	New Chat
`⌘⇧S`	Stop Generation
`⌘1`	Models
`⌘2`	Model Cache
`⌘3`	Usage Statistics
`⌘4`	Persona Marketplace
`⌘5`	Knowledge
`⌘6`	Model Comparison
`⌘7`	Model Benchmark
`⌘8`	Custom Model Import
`⌘9`	Plugins
`⌘0`	MCP Servers
`⌘⇧E`	Export All Data
`⌘⇧I`	Import Data
`?`	Keyboard Shortcuts Overlay

On Windows/Linux, replace ⌘ with Ctrl.

Development

npm install          # Install dependencies
npm run dev          # Start dev server (http://localhost:5173)
npm run build        # Production build to dist/
npm run preview      # Preview production build

Requirements

Chrome 113+ or Edge 113+ (WebGPU support required)
GPU with 2GB+ VRAM recommended
~200MB–2GB storage per model (cached in browser)

Project Structure

monday/
├── public/
│   ├── favicon.svg            # App icon (purple gradient smiley)
│   ├── apple-touch-icon.svg   # iOS home screen icon
│   └── manifest.json          # PWA manifest
├── src/
│   ├── App.tsx                # Root: view router, state orchestration
│   ├── App.css                # All component styles
│   ├── components/
│   │   ├── Sidebar.tsx        # Session list, brand, version link
│   │   ├── ModelSelector.tsx  # Model cards with BorderBeam
│   │   ├── MessageList.tsx    # Chat message rendering
│   │   ├── ChatInput.tsx      # Input textarea with send/stop
│   │   ├── Changelog.tsx      # Expandable release history
│   │   ├── ThemeToggle.tsx    # Light/Dark/System switcher
│   │   └── WebGPUCheck.tsx    # WebGPU compatibility warning
│   ├── hooks/
│   │   ├── useChat.ts         # Session/message/streaming state
│   │   ├── useModel.ts        # Model load/unload/progress
│   │   └── useTheme.ts        # Theme persistence + system detection
│   ├── lib/
│   │   ├── engine.ts          # Web-LLM singleton, streamChat()
│   │   ├── models.ts          # Model registry (7 models)
│   │   ├── storage.ts         # IndexedDB CRUD
│   │   └── changelog.ts       # Version history data
│   └── types/
│       └── index.ts           # TypeScript interfaces
├── index.html                 # Entry HTML with mobile meta tags
├── vite.config.ts             # Vite config (base: '/monday/')
└── package.json               # v0.1.0

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
.claude/skills/ui-ux-pro-max		.claude/skills/ui-ux-pro-max
.github/workflows		.github/workflows
public		public
src		src
tauri		tauri
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Folders and files

Latest commit

History

Repository files navigation

Monday - Browser AI Chat

Features

Architecture

High-Level System Architecture

Routing Architecture

Component Architecture

Data Flow: Chat Message Lifecycle

Model Loading Flow

Tech Stack

Supported Models

Competitive Analysis

Roadmap

Versioned task breakdown

v0.25 — Knowledge & RAG (storage layer) (current target)

v0.26 — RAG (retrieval + citation)

v0.27 — Tools, Function calling, MCP

v0.28 — Collaboration & Sharing

v0.29 — Desktop, PWA polish & shortcuts

v0.30 — Agent mode & analytics

v0.31 — Code Arena / Showdown Mode

v1.0 — External LLM Providers & Web Search (stable)

v1.1 — Skills System

v1.2 — Self-Improving Agent & Persistent Memory

v1.3 — Browser-Use & Computer-Use (In-Browser Agent Loop)

v1.4 — Persona Marketplace & Image Input

v1.5 — Voice & TTS — Multimodal I/O

v1.6 — Context Injection

Cross-cutting standing rules

Historical phases (for reference)

Phase 1 — Core Chat Enhancement (v0.2.x)

Phase 2 — Model Management (v0.3.x)

Phase 3 — Prompt Templates & Personas (v0.7.x)

Phase 4 — Multimodal & Rich Input (v0.5.x)

Phase 5 — Knowledge & RAG (v0.6.x)

Phase 6 — Tools & Plugins (v0.7.x)

Phase 0.8 — Personalization & Discovery (v0.8.x)

Phase 7 — Collaboration & Sharing (v0.9.x)

Phase 8 — Desktop & PWA (v0.9.x)

Phase 9 — Advanced AI Features (v1.0.x)

Phase 10 — External LLM Providers & Web Search (v1.1.x)

Keyboard Shortcuts

Development

Requirements

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages