RLMs in Rust using RustPython and gVisor
prek installLinux:
sudo apt-get update && sudo apt-get install -y runsc
sudo runsc install
sudo systemctl restart dockerEC2:
- aws cli and auth setup
IAM_USER=<iam-user> make aws-setup # optionally specify IAM_USER to create access key, then create key pair
ARCH=arm64 INSTANCE_TYPE=t4g.medium ROOT_GB=50 make create # optionally specify ARCH, INSTANCE_TYPE, ROOT_GB, then create instance
make conn
# in the instance
make ec2-setupCreate a .env file with the following variables:
OPENAI_API_KEY=<api-key>Run make help for the full list of commands.
For both Linux and EC2 instances:
RLM_METHOD=<rlm|lambda_rlm> cargo run
make app METHOD=<rlm|lambda_rlm>
make goose HOST=<host>- port rlm-minimal to Rust and RustPython
- unblock event loop
- add support for depth > 1
- add shared program state
- add per-session REPL sandboxing with gVisor
- add toggle for λ-RLM paper and code
Requests within a session remain ordered while different sessions execute concurrently, so one long-running REPL interaction does not create cross-session head-of-line blocking for unrelated traffic. Ingress is bounded and fails fast under saturation instead of queueing indefinitely, and pool ownership is centralized in a single broker to avoid contention around mutable container state.
The async runtime separates network-facing work from interpreter execution so that blocking Python operations do not starve request handling or model I/O. REPL commands are dispatched through channels to a dedicated worker thread, which isolates synchronous interpreter calls from the async control plane. A persistent REPL worker is used to preserve interpreter-local state across iterative commands and to avoid per-command thread startup costs.
The load test runs 20 simulated users for 5 minutes against /v1/chat/completions.



