Skip to content

jagnani73/insidepoly

Repository files navigation

InsidePoly

Real-time insider-trading detector for Polymarket prediction markets on Polygon. Every OrderFilled event is ingested, every trader is scored 0–100 across five behavioural signals, and suspicious wallets surface on a live leaderboard.

Full product requirements: objective.pdf


Stack

Packages

Package Role
packages/common Shared library. Drizzle schema + migrations, ERC20 contracts, AppError, shared TypeScript types (Wallet, Trade, Market, WalletStats, …). Sub-path exports: common/database, common/contracts, common/errors, common/types. Must be built before backend or frontend.
packages/backend Express API on port 8000. Static singleton services, WebSocket server (socket.io), scoring engine, and the full subgraph ingestion pipeline.
packages/frontend Next.js single-page leaderboard UI. TanStack React Query polling + socket.io WebSocket subscriptions. No client-side routing.

Databases

Database Role
Supabase / PostgreSQL Source of truth. All persistent state: markets, trades, wallets, score history. Queried directly via Drizzle ORM — no views, no JSONB, no materialized caches.
Redis Ephemeral operational layer. Never the source of truth — always rebuildable from Postgres. Holds job queues and sync cursors.

External Data Sources

Source What is queried Purpose
Polymarket Subgraph — trades orderFilledEvents · marketDatas All trade history. Resolves token IDs → condition IDs + outcome indices. Requires GRAPH_API_KEY.
Polymarket Subgraph — metadata questions Market titles, creation timestamps (blockTimestamp), neg-risk flags. Requires GRAPH_API_KEY.
Alchemy RPC — Polygon mainnet (chain 137) alchemy_getAssetTransfers Wallet's first-ever USDC.e receipt on Polygon — used for the Wallet Age signal (first_tx_at). Requires RPC_URL_137.

Both subgraph endpoints sit behind The Graph Gateway and use GRAPH_API_KEY as a Bearer token.

Background Jobs

All jobs are managed by JobsService and must be started before SubgraphService.init() so consumers are ready before data arrives.

Job Type Interval What it does
Enrichment Loop Continuous BRPOP (1 s timeout) Pops raw OrderFilled events from raw:events:queue, resolves token IDs → condition IDs via Subgraph, upserts markets / wallets / trades to Postgres, queues wallets for scoring and RPC backfill.
Scoring Loop Continuous SPOP batch (1 s idle) Pops up to 250 addresses from scoring:pending, calls score_wallets() PostgreSQL function, writes results to wallets + wallet_scores_history.
Realtime Sync Interval 1 min Fetches new orderFilledEvents from the Subgraph since the last cursor, pushes them to raw:events:queue.
Volume Refresh Interval 5 min Recomputes total_volume_usdc for every market in Postgres.
RPC Backfill Sweep Interval 15 min Pops wallet addresses from rpc:backfill:queue, fetches first_tx_at via Alchemy, updates wallets, re-queues updated wallets for scoring.
Rescore Sweep Interval 6 hrs Pushes all Watchlist+ wallets back onto scoring:pending to keep scores current as market lifecycles progress.

Entities

Market

What it is: A Polymarket prediction market, identified by its condition ID (0x…).

Source: Two Subgraph calls — the trades subgraph resolves marketDatas (token ID → condition ID + outcome index); the metadata subgraph resolves questions (title, blockTimestamp, isNegRisk). Fetched during enrichment as new token IDs appear.

Storage: markets table — id (condition ID, text PK), question, started_at, is_neg_risk, total_volume_usdc.

Used for: Entry timing signal (market lifecycle boundaries: started_atMAX(filled_at)). Volume refresh. Enriching trade records with market context for the UI.


Wallet

What it is: An EVM address that has placed at least one Polymarket trade.

Source: Extracted from orderFilledEvents.maker. first_tx_at is fetched asynchronously by the RPC backfill sweep via alchemy_getAssetTransfers.

Storage: wallets table — address (unique), current_score, score_tier, first_tx_at, first_trade_at, total_volume_usdc, plus one denormalized column per signal (signal_concentration_pts, signal_market_count_pts, signal_minimum_size_pts, signal_entry_timing_pts, signal_wallet_age_pts). Signal columns are duplicated here so the leaderboard query requires no joins.

Used for: Scoring. Leaderboard ranking.


Trade

What it is: A single OrderFilled event from the Polymarket CTF Exchange (0x4bFb41d5B3570DeFd03C39a9A4D8dE6Bd8B8982E).

Source: Polymarket Subgraph (trades endpoint) — orderFilledEvents paginated ascending by ID. Deduplicated on tx_hash.

Storage: trades table — wallet_address, market_id, size_usdc, entry_price (0–1 fraction), side (YES/NO), filled_at, tx_hash.

Used for: Computing all five scoring signals. Market volume aggregation. Trade history panel in the UI.


Wallet Score History

What it is: A point-in-time snapshot of a wallet's insider score and per-signal breakdown, recorded whenever the score is recomputed.

Source: Produced by the scoring engine after every score_wallets() call.

Storage: wallet_scores_history table — wallet_address, score, one column per signal, trigger (new_trade | cron), recorded_at.

Used for: Score sparkline chart in the wallet detail panel. Auditing why a score changed.


Startup & Data Flow

Service Initialisation

flowchart TD
    A["pnpm dev:backend"] --> B

    subgraph B["Parallel Init"]
        direction LR
        C["DatabaseService<br/>Supabase / Postgres"]
        D["RedisService<br/>2 named ioredis clients"]
        E["ViemService<br/>Polygon RPC"]
        F["WSService<br/>socket.io namespaces"]
    end

    B --> G["JobsService.start()<br/>4 intervals + 2 loops"]
    G --> H["Express listen :8000"]
    H --> I["SubgraphService.init()<br/>historical bulk sync — paginate all OrderFilled events"]
    I --> J["syncMode = realtime<br/>JobsService 1-min sync takes over"]
Loading

Data Pipeline

flowchart TD
    SG1["Subgraph<br/>trades"] -->|"orderFilledEvents<br/>bulk on boot + delta 1 min"| Q1[/"raw:events:queue<br/>Redis List"/]
    Q1 -->|"BRPOP"| EL["Enrichment Loop"]
    SG2["Subgraph<br/>metadata"] -->|"resolve tokenId<br/>→ conditionId + title"| EL
    EL -->|"upsert<br/>markets · wallets<br/>trades"| PG[("PostgreSQL")]
    EL -->|"SADD"| Q2[/"scoring:pending<br/>Redis Set"/]
    EL -->|"SADD"| Q3[/"rpc:backfill:queue<br/>Redis Set"/]
    Q3 -->|"sweep 15 min"| RPC["Alchemy RPC<br/>Polygon"]
    RPC -->|"first_tx_at"| PG
    RPC -->|"re-queue"| Q2
    Q2 -->|"SPOP 250"| SL["Scoring Loop"]
    SL -->|"score_wallets()<br/>PostgreSQL fn"| PG
    PG -->|"every 15 s<br/>+ on subscribe"| STAT["StatsService<br/>→ /stats WS"]

    STAT -->|"WS data event<br/>tier distribution"| FE["Frontend"]
    PG -->|"REST poll 30 s<br/>GET /wallets"| FE
    PG -->|"REST on-demand<br/>GET /wallets/:address"| FE
Loading

Client Connections

REST (polling)

The frontend polls every 30 seconds for leaderboard data. Wallet detail data is fetched on demand when a row is clicked.

Endpoint When Returns
GET /api/v1/wallets Poll every 30 s Wallet[] — paginated leaderboard sorted by score desc. Filter by ?tier=
GET /api/v1/wallets/:address On wallet select WalletDetail (Wallet + trade_count, market_count)
GET /api/v1/wallets/:address/history On wallet select WalletScoresHistory[] — score snapshots for sparkline
GET /api/v1/wallets/:address/trades On wallet select TradeWithMarket[] — trades + market_question via LEFT JOIN
GET /healthcheck Monitor { success, timestamp, uptime, stream }

Rate limit: 100 requests / 60 seconds per IP.

WebSocket (socket.io)

One namespace. Clients emit "subscribe" / "unsubscribe"; server confirms with "subscribed" / "unsubscribed" { success: boolean } then pushes "data" events.

/stats — aggregate broadcast

Subscribe once. The server emits a full aggregate snapshot immediately on subscribe (so clients see data before the first tick) and then every 15 seconds.

socket.on("data", {
    total_wallets: number,
    normal: number,
    watchlist: number,
    suspicious: number,
    flagged: number,

    // Per-signal point-bucket counts
    conc_25: number,
    conc_15: number,
    conc_5: number,
    conc_0: number,
    mkt_25: number,
    mkt_15: number,
    mkt_5: number,
    mkt_0: number,
    size_25: number,
    size_15: number,
    size_5: number,
    size_0: number,
    timing_25: number,
    timing_15: number,
    timing_5: number,
    timing_0: number,
    age_25: number,
    age_15: number,
    age_5: number,
    age_0: number,
    age_null: number,

    all_max: number, // wallets with all five signals at max (25 pts each)
});

Scoring Model

Five signals, 25 pts each. Max raw = 125, clamped to 100. Scoring runs entirely inside PostgreSQL via score_wallets(p_addresses text[]) — a set-returning function that joins trades, markets, and wallets in five CTEs and returns one scored row per address. The service layer calls this function and wraps the upsert + history insert in a single transaction.

Final Score = min( H_C + H_MC + H_MS + H_ET + H_WA , 100 )   →  0–100
Score Tier
80–100 🚨 Flagged Insider
60–79 ⚠️ Suspicious
30–59 👁️ Watchlist
0–29 Normal

A wallet enters scoring:pending when: (1) a new trade is ingested, or (2) the 6-hour rescore sweep runs for all Watchlist+ wallets.


Signal 1 — Trade Concentration (H_C, max 25 pts)

Definition: wallet's USDC in its top market / wallet's total lifetime USDC on Polymarket.

Where top market = the single market with the highest SUM(size_usdc) across all of the wallet's trades — where it spent the most, not where it traded most frequently.

Answers "what fraction of this wallet's own lifetime Polymarket activity is dominated by a single market?" A wallet that put everything into one market scores 100 %; a wallet spread evenly across 50 markets scores ~2 %.

Volume Concentration Points
> 90 % 25
70–90 % 15
50–70 % 5
< 50 % 0

Alternative interpretations not used:

  • Market share concentration (wallet's USDC in market / total USDC volume of that market) — measures whether the wallet is a significant fraction of a thin market's liquidity. A property of the market, not the wallet's behaviour pattern.
  • Size-gated concentration — only scoring concentration when the top-market position exceeds a threshold (e.g. $1k) to filter out trivial all-in bets. Not implemented — noise from tiny wallets is acceptable at current scale.

Signal 2 — Market Count (H_MC, max 25 pts)

Definition: number of distinct Polymarket markets the wallet has ever traded.

Answers "how narrowly focused is this wallet?" Insiders act on specific information about specific events — they trade very few markets. A broad trader spread across many markets is less likely to be acting on inside information.

Distinct Markets Points
1 25
2–3 15
4–5 5
≥ 6 0

Signal 3 — Position Size (H_MS, max 25 pts)

Definition: wallet's total lifetime USDC volume across all Polymarket trades.

Answers "did this wallet deploy meaningful capital?" Insiders drop large amounts of money on their bets. Tiny wallets are noise and should score low regardless of other signals.

Lifetime Volume Points
≥ $10 000 25
≥ $1 000 15
≥ $100 5
< $100 0

Signal 4 — Entry Timing (H_ET, max 25 pts)

Definition: how far through the market's active lifecycle the wallet first entered its primary market (same top market used for H_C).

market_open  = COALESCE(markets.started_at, MIN(all trades in market).filled_at)
market_close = MAX(all trades in market).filled_at   -- proxy for resolution
wallet_entry = MIN(wallet's trades in primary market).filled_at

entry_ratio  = (wallet_entry − market_open) / (market_close − market_open)

Answers "did this wallet enter late, when insiders typically act?" A wallet that opens a position in the last 10 % of a market's active life is highly suspicious. Early entrants (before the 50 % mark) are not penalized.

If market_open = market_close (single-timestamp market) → 0 pts (undefined, treated as neutral).

Entry into Market Lifecycle Points
> 90 % 25
> 70 % 15
> 50 % 5
≤ 50 % or undefined 0

Resolution proxy: market_close = MAX(filled_at) is an accurate proxy for resolved markets (trading stops at resolution). For still-open markets it tracks the most recent trade, making scores for those markets less reliable. Since insider detection is primarily retrospective this is acceptable.


Signal 5 — Wallet Age (H_WA, max 25 pts)

Definition: gap between the wallet's first-ever USDC.e receipt on Polygon and its first Polymarket trade.

Answers "was this wallet created specifically to make this trade?" A wallet funded and trading within hours is a strong insider signal. Established wallets with a long history of on-chain activity are less suspicious.

Sourced via alchemy_getAssetTransfers (RPC_URL_137). Stored as first_tx_at on the wallets table. Scores 0 if first_tx_at IS NULL — RPC unavailable, or wallet never received USDC.e on Polygon before its first trade.

Gap: first USDC.e receipt → first Polymarket trade Points
< 1 hour 25
1–24 hours 15
1–7 days 5
> 7 days or NULL 0

Potential Signals (explored, not in production)

Two additional signals were prototyped in the original scoring function but removed when the schema was simplified.

Signal 6 — Funding Source (H_FS, up to 30 pts)

Definition: who funded this wallet's USDC on Polygon, and whether that funder is already flagged.

Answers "is this wallet part of a known funding cluster?" A wallet that was directly funded by a flagged insider is almost certainly related.

Condition Points
Funded directly by a flagged wallet 30
Shares a funder with a suspicious/flagged wallet 20
Funded by an unrecognised non-CEX wallet 10
CEX-funded or funder unknown 0

Additionally, being funded by a flagged wallet triggered a hard override floor — the final score could not go below 70 regardless of other signals.

Entry Timing — Retroactive Price Multiplier

The original H_ET also had a 1.5× retroactive multiplier: if the wallet bet YES at a price < 0.30 on a market that resolved YES, or bet NO at a price > 0.70 on a market that resolved NO, the raw timing score was multiplied by 1.5 (capped at 45 pts). This rewards cases where the wallet not only entered late but also took a contrarian position at long odds — and was right.


Getting Started

Prerequisites

  • Node.js ≥ 20
  • pnpm ≥ 9
  • A Supabase project (PostgreSQL)
  • A Redis instance
  • The Graph API key (GRAPH_API_KEY)
  • An Alchemy RPC URL for Polygon mainnet (RPC_URL_137)

Environment Variables

Copy the example files and fill in values:

cp packages/backend/.env.example packages/backend/.env
cp packages/frontend/.env.example packages/frontend/.env.local

packages/backend/.env — see packages/backend/.env.example

Variable Required Description
DATABASE_URL Yes PostgreSQL connection string (Supabase)
REDIS_URL Yes Redis connection string
GRAPH_API_KEY Yes The Graph Gateway API key
RPC_URL_137 Yes Alchemy RPC URL for Polygon mainnet
PORT No Express port (default 8000)
NODE_ENV No development | production
LOG_LEVEL No Comma-separated log levels

packages/frontend/.env.local — see packages/frontend/.env.example

Variable Required Description
NEXT_PUBLIC_API_BASE_URL Yes Backend base URL (e.g. http://localhost:8000)

Install & Run

pnpm install

# Development
pnpm dev:backend     # builds common, starts backend with watch mode
pnpm dev:frontend    # builds common, starts Next.js dev server

# Production build
pnpm build:backend
pnpm build:frontend

# Quality checks
pnpm check           # lint + typecheck together
pnpm format:check

Database Setup

Apply the Drizzle schema to your Supabase project and run any SQL migrations (including the score_wallets PostgreSQL function):

cd packages/common
pnpm exec drizzle-kit push

DB reset: When wiping the database, also clear both Redis sync cursors — on restart, subgraph:last_synced_at is used as the timestamp_gt bound for the bulk query, so a stale timestamp will skip all historical data even if subgraph:last_event_id is cleared:

redis-cli DEL subgraph:last_event_id subgraph:last_synced_at

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors