Skip to content

Plan to add s2 storage to browser events#229

Open
archandatta wants to merge 2 commits intomainfrom
archand/plan/events-s2-storage
Open

Plan to add s2 storage to browser events#229
archandatta wants to merge 2 commits intomainfrom
archand/plan/events-s2-storage

Conversation

@archandatta
Copy link
Copy Markdown
Contributor

@archandatta archandatta commented Apr 24, 2026

Note

Low Risk
Documentation-only change adding a design plan; no runtime code, APIs, or schemas are modified in this PR.

Overview
Adds a new design document, plans/s2-storage.md, outlining a proposed durable storage sink for browser events using S2 (including proposed components, configuration/env vars, shutdown sequencing, and planned endpoint/schema touchpoints).

Reviewed by Cursor Bugbot for commit 676dbd7. Bugbot is set up for automated code reviews on this repo. Configure here.

@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR title and empty body provide insufficient information to determine if this changes kernel API endpoints or Temporal workflows; please clarify the scope of changes or opt in manually.

To monitor this PR anyway, reply with @firetiger monitor this.

@archandatta archandatta changed the title feat: add plan Plan to add s2 storage to browser events Apr 24, 2026
@archandatta archandatta requested review from Sayan- and rgarcia April 24, 2026 19:36
Copy link
Copy Markdown
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few q's!

Comment thread plans/s2-storage.md
Comment on lines +72 to +77
### 1. Stream name = capture session ID

Each capture session maps to a dedicated stream named by the session UUID. Streams are created automatically on first write (S2 does this via create-stream-on-append basin feature). This means:

- Replaying a session = reading one stream from seq 0
- Concurrent sessions write to separate streams with no coordination
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we persist capture session anywhere? mainly trying to understand how we'll do "reads" (e.g. after a browser session is destroyed or something)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, I still have to make a pass to the kernel api to add the other endpoints. In that I will add db to add field.String("s2_stream") to capture this

Comment thread plans/s2-storage.md
Comment on lines +83 to +92
### 3. Batching: 100ms linger / 50 records (S2 backend)

The S2 SDK batcher coalesces records before flushing to the network. Configuration:

```
Linger: 100ms
MaxRecords: 50
```

These are independent of the ring buffer read loop — the writer appends one record per ring Read, and the batcher decides when to flush.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to have these be env vars / configurable so we can externally control

Comment thread plans/s2-storage.md

---

## System Context
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this flow makes sense overall. I think it'd be helpful to have more clarity overall on:

  1. how enabling the s2 delivery ties into the existing APIs
  2. credentials for s2 within the vm

@archandatta archandatta requested a review from Sayan- May 4, 2026 17:39
Comment thread plans/s2-storage.md
1. ctx cancelled (SIGINT/SIGTERM)
2. EventsStorageWriter.Run returns (reader unblocks from cancelled ctx)
3. storageDone channel closes
4. storageWriter.Close() — drains in-flight S2 writes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also helpful to walk through the actual browser teardown pathway and confirm we have the right interface in this API server to at best effort flush capture stream to s2. I'd expect a number of our users would do something like a try -> {start capture session, run automation} -> catch (log error) -> finally {delete browser}. So ensuring we can get the data out before the vm is gone is valuable here ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants