Skip to content

andrewhinh/rlm-rs

Repository files navigation

rlm-rs

RLMs in Rust using RustPython and gVisor

icon

Development

System requirements

  • Linux running x86-64 or ARM64 architectures. See instructions for running on AWS EC2 below.

Installation

prek install

Linux:

sudo apt-get update && sudo apt-get install -y runsc
sudo runsc install
sudo systemctl restart docker

EC2:

IAM_USER=<iam-user> make aws-setup                          # optionally specify IAM_USER to create access key, then create key pair
ARCH=arm64 INSTANCE_TYPE=t4g.medium ROOT_GB=50 make create  # optionally specify ARCH, INSTANCE_TYPE, ROOT_GB, then create instance
make conn

# in the instance
make ec2-setup

Setup

Create a .env file with the following variables:

OPENAI_API_KEY=<api-key>

Commands

Run make help for the full list of commands.

For both Linux and EC2 instances:

RLM_METHOD=<rlm|lambda_rlm> cargo run
make app METHOD=<rlm|lambda_rlm>
make goose HOST=<host>

Roadmap

  • port rlm-minimal to Rust and RustPython
  • unblock event loop
  • add support for depth > 1
  • add shared program state
  • add per-session REPL sandboxing with gVisor
  • add toggle for λ-RLM paper and code

Details

Sandboxing

arch

Requests within a session remain ordered while different sessions execute concurrently, so one long-running REPL interaction does not create cross-session head-of-line blocking for unrelated traffic. Ingress is bounded and fails fast under saturation instead of queueing indefinitely, and pool ownership is centralized in a single broker to avoid contention around mutable container state.

Async Runtime

async

The async runtime separates network-facing work from interpreter execution so that blocking Python operations do not starve request handling or model I/O. REPL commands are dispatched through channels to a dedicated worker thread, which isolates synchronous interpreter calls from the async control plane. A persistent REPL worker is used to preserve interpreter-local state across iterative commands and to avoid per-command thread startup costs.

Load Testing

The load test runs 20 simulated users for 5 minutes against /v1/chat/completions.

request

response

Credit

About

RLMs in Rust using RustPython and gVisor.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors