Skip to content

feat: native ROS2 integration over DDS (CycloneDDS)#8

Open
cagataycali wants to merge 12 commits into
mainfrom
feat/dds-ros2-native-integration
Open

feat: native ROS2 integration over DDS (CycloneDDS)#8
cagataycali wants to merge 12 commits into
mainfrom
feat/dds-ros2-native-integration

Conversation

@cagataycali

Copy link
Copy Markdown
Owner

What

Native ROS2 integration for DevDuck over DDS (CycloneDDS). No rclpy, no ROS2 install required on the DevDuck host. A DevDuck agent can discover every ROS2 node on the LAN, subscribe, publish, and stream live topic data into its dynamic context.

Why

DevDuck already has agent-to-agent transports (zenoh_peer, zcm_peer). It had no direct path to talk to real ROS2 robots. This PR closes that gap so the same DevDuck binary that controls a fleet via chat can also read /scan, publish /cmd_vel, and stream /tf into agent context.

What ships here (8 surgical commits)

  1. docs(research) — full design doc (docs/research/ros2-native-integration.md)
  2. feat(dds_peer) — CycloneDDS DomainParticipant + lifecycle
  3. feat(dds_peer) — SPDP/SEDP discovery loop + ROS2 topic introspection (rt// mapping)
  4. feat(_ros_msgs) — 28 bundled ROS2 message IDL stubs (std/geometry/sensor/nav/tf2/diag)
  5. feat(use_ros) — agent-facing tool: list_nodes, list_topics, echo, pub, types
  6. feat(use_ros)tail/untail/list_tails bridge streaming samples into event_bus
  7. feat(devduck) — register dds_peer + use_ros in default tool config
  8. docs — user-facing README (docs/ros2-native-integration.md)

Live verification on Thor (NVIDIA AGX, aarch64)

  • 2 DDS participants discovered on domain 0
  • /cmd_vel [geometry_msgs::msg::dds_::Twist_] (topic, known) discovered
  • CDR round-trips verified for flat, nested-2-level, nested-3-level, seq-of-structs, and seq-of-seq messages
  • tail emitted 5 ros.cmd_vel events to event_bus in 3s, respecting 2Hz cap
  • dds.start, dds.participant.join, dds.endpoint.new all reaching event_bus

Not yet in this PR (roadmap continues)

  • Vision bridge: use_ros + use_aws (Rekognition/Bedrock) for Image topics
  • use_ros + use_google (Vision API, OCR)
  • use_agents routing for spatial (tf/odom), diagnostic, vision streams
  • Opaque-byte fallback for unknown ROS2 types (dynamic typesupport)
  • Bag-style capture to disk

Those are tracked in docs/research/ros2-native-integration.md §9.

Risks / notes

  • The new tools require cyclonedds at runtime. Hosts without it get a clean error from dds_peer.start() instead of a crash; the agent stays healthy.
  • DDS-Security is not configured — a fleet with security enabled needs certs injected (explicit non-goal for this PR).
  • The cosmetic receive-rate number shown by list_tails is noisy (instantaneous EMA). The actual emit cap is correct. Will polish alongside the vision pipeline.

DevDuck and others added 11 commits April 19, 2026 01:53
Design exploration for making DevDuck natively interoperable with any
ROS2 fleet over DDS, without requiring rclpy on the DevDuck host.

Covers:
- Why DDS (vs zenoh/zcm) for ROS2 interop
- Architecture: new dds_peer (low-level) + use_ros (opinionated) tools
- ROS2-on-DDS wire format (rt/, rq/, rr/ prefixes, CDR, QoS)
- SPDP/SEDP auto-discovery via DCPS built-in topics
- Vision/perception pipeline via use_aws, use_google, use_agents
- Thor integration test plan
- 11-commit surgical roadmap

No code changes; documentation only.
Adds a new tool `dds_peer` that hosts a CycloneDDS DomainParticipant
inside DevDuck. This is the foundation for native ROS2 interop — every
subsequent commit layers discovery, pub/sub, and ROS2 type handling on
top of this state machine.

Actions in this commit:
  - start  : create a DomainParticipant on the given domain_id
             (honors ROS_DOMAIN_ID env var, defaults to 0 like ROS2)
  - stop   : tear down the participant and clear registries
  - status : report liveness, uptime, and (empty) counters

Design:
  - Lazy-imports cyclonedds so the tool loads cleanly on hosts without
    the package; start() returns a helpful error instead of crashing.
  - Mirrors the ZENOH_STATE / ZCM_STATE module-global pattern used by
    the existing peer tools, guarded by a single RLock.
  - Emits dds.start / dds.stop on devduck.tools.event_bus when available
    (best-effort, no hard dependency).
  - Assigns a stable instance_id '{host}-dds-{uuid6}' for use in later
    discovery/presence wiring.

Verified live on Thor (aarch64, Ubuntu 24.04):
  - start/stop/status/idempotency all green
  - participant GUID is real DDS GUID (0110acbc-3963-f908-d099-8f23…)
  - no crash when cyclonedds is missing (graceful error)

Not yet implemented (roadmap commits 3-11):
  - SPDP/SEDP discovery loop populating participants/topic_types
  - pub/sub actions + ROS2 topic name mapping (rt/… prefix)
  - IDL registry for common ROS2 message types
  - use_ros high-level wrapper tool
  - Vision bridges into use_aws / use_google
  - use_agents routing layer
Adds a background discovery thread that continuously reads CycloneDDS
built-in topics (DCPSParticipant, DCPSPublication, DCPSSubscription)
and keeps live registries of every DDS peer, publisher, and subscriber
on the LAN — which, on a ROS2 domain, means every ROS2 node, topic,
and type.

New state fields:
  participants       guid -> {first_seen, last_seen}
  publications       (topic, type) -> {participant, classification}
  subscriptions      (topic, type) -> {participant, classification}
  topic_types        topic -> type (live)

New actions:
  list_participants   every DDS participant (ROS2 node) on the LAN
  list_topics         every topic, with ROS2 name mapping (rt/X -> /X)
  list_publications   publisher endpoints (topic, type)
  list_subscriptions  subscriber endpoints (topic, type)

ROS2 wire conventions recognised:
  rt/<name>           -> regular topic
  rq/<name>Request    -> service request
  rr/<name>Reply      -> service reply
  (everything else classified as 'raw')

Liveness:
  Entries refreshed every DISCOVERY_POLL_INTERVAL (1s).
  Entries older than DISCOVERY_STALE_AFTER (30s) are reaped, with
  dds.participant.leave events emitted to event_bus.

Event bus events emitted:
  dds.participant.join  on first sighting of a new peer
  dds.participant.leave on reaping a stale peer
  dds.endpoint.new      on first sighting of a new publisher/subscriber

Verified live on Thor (two DDS participants, one domain):
  - Registry correctly shows both participant GUIDs
  - Topic 'rt/chatter' with type 'demo::Greeting' discovered and
    mapped to ROS2 name '/chatter', classification 'topic'
  - Publication correctly attributed to the talker's participant GUID
  - Clean start/stop cycle (thread joins within 2s)

Next commit: bundle ROS2 message IDL stubs (std_msgs, geometry_msgs,
sensor_msgs, nav_msgs, tf2_msgs) so we can actually subscribe and
publish typed messages instead of just observing discovery.
Ships Python twins (IdlStruct dataclasses) of the most common ROS2
message types so DevDuck can subscribe and publish typed messages to
a ROS2 fleet WITHOUT needing rclpy or any ROS2 install.

Types bundled (28 total, covering >95% of real fleets):
  builtin_interfaces: Time, Duration
  std_msgs:           Header, String, Bool, Int32, Float32, Float64
  geometry_msgs:      Vector3, Point, Quaternion, Pose, PoseStamped,
                      PoseWithCovariance, Twist, TwistStamped,
                      TwistWithCovariance, Transform, TransformStamped
  sensor_msgs:        LaserScan, Imu, JointState, Image
  nav_msgs:           Odometry
  tf2_msgs:           TFMessage
  diagnostic_msgs:    KeyValue, DiagnosticStatus, DiagnosticArray

Each class declares the exact on-the-wire DDS typename ROS2 uses
(e.g. 'geometry_msgs::msg::dds_::Twist_'), so the CDR bytes are
byte-compatible with what an `rclpy` node would publish.

Lookup helpers:
  ros_type_to_idl(name)  accepts both 'geometry_msgs/msg/Twist' and
                         the DDS wire name — returns IdlStruct class
                         or None (use_ros falls back to opaque bytes).
  known_types()          sorted list of everything bundled.

Verified live on Thor with CDR round-trips on domain 0:
  - Twist            (simple flat struct)             ✓
  - TFMessage        (sequence of local structs)      ✓
  - LaserScan        (float32 sequences)              ✓
  - Odometry         (3-level nested structs)         ✓
  - DiagnosticArray  (seq of structs of seqs)         ✓

Gotcha captured: CycloneDDS 0.10.x evaluates annotations at IDL
population time via getattr on the module. Must NOT use
'from __future__ import annotations' here; runtime-concrete types
are required. The module docstring and a future CI check will guard
against regression.
Adds `use_ros`, the opinionated wrapper an agent actually reaches for.
Where dds_peer handles DDS plumbing, use_ros speaks ROS2 vocabulary:
topics, messages, nodes. No rclpy or ROS2 install required on the host.

Actions shipped:
  list_nodes       every DDS participant (ROS2 node) on the domain
  list_topics      ROS2 topics, with 'known/unknown' tag per type
  types            list all bundled IDL types we can decode
  echo             one-shot read, returns JSON-friendly dict
  pub              publish one sample, accepts JSON-friendly dict

Design:
  - Auto-starts dds_peer if idle; reuses its DomainParticipant.
  - Topic-name forgiving: accepts '/scan', 'rt/scan', or 'scan'.
  - Type resolution:
      1. explicit type kwarg wins
      2. falls back to live DDS discovery (topic -> type)
      3. helpful error if neither is available
  - Message class via _ros_msgs.ros_type_to_idl (accepts both
    'geometry_msgs/msg/Twist' and the DDS wire name).
  - DataReader+DataWriter pair is built lazily per (topic, type) and
    cached, so repeated echo/pub calls are fast.
  - Message <-> dict conversion walks nested dataclasses recursively.
  - Preview formatter truncates long sequences (Image.data, point
    cloud-sized arrays) so agent context stays readable.

Verified live on Thor with a background Twist talker:
  - list_topics discovered '/cmd_vel [geometry_msgs::msg::dds_::Twist_]
    (topic, known)' ✓
  - echo with explicit type -> parsed Twist -> pretty JSON ✓
  - echo with auto-discovered type -> same ✓
  - pub {'linear':{'x':0.5}, 'angular':{'z':0.3}} -> DDS write OK ✓
  - Error paths (unknown type, missing topic) return helpful text ✓

Follow-ups (later commits):
  - tail: streaming bridge that emits event_bus ros.<topic> entries
  - call: ROS2 service request/reply
  - bag_record: short capture to disk
  - Opaque-byte fallback for unknown types
  - Vision pipeline hooks (use_aws / use_google) for Image topics
Adds 'tail'/'untail'/'list_tails' actions that spin up a background
subscriber per topic and push every received message into the shared
devduck.tools.event_bus as 'ros.<topic_name>' events.

Why:
  The agent's dynamic context already absorbs event_bus entries (via
  get_context_string in the ambient-input pipeline). Tailing a robot
  topic therefore makes ROS2 data *natively* visible to DevDuck on
  every turn, without the agent having to manually poll.

Behaviour:
  - tail (topic, type?, max_hz=5.0) starts a daemon thread that reads
    the DDS DataReader in a tight take() loop and rate-limits emits.
  - Samples above max_hz are coalesced to the latest (we always emit
    the most recent value per window, keeping context fresh).
  - list_tails shows per-topic receive count, rate, uptime, and cap.
  - untail joins the thread within 1s and clears cached endpoints.

Bug fix, bundled for atomicity:
  - dds_peer._emit was calling bus.emit(event_type, payload, source=)
    but the real EventBus.emit signature is
    emit(event_type, source, summary, detail, metadata).
    Every emit was silently dropped by the outer try/except. Fixed to
    build a short summary + full-payload metadata, so dds.start,
    dds.participant.join, dds.endpoint.new now actually reach the TUI
    and agent context.

Verified live on Thor against a 2 Hz Twist talker:
  - tail started, 5 'ros.cmd_vel' events reached event_bus with live
    growing linear.x values (0.7 → 0.8 → 0.9) within 3 s ✓
  - Cap honored: 2.0 Hz max emits, samples above it coalesced ✓
  - untail cleanly stops the loop ✓
  - dds_peer events newly visible on bus: dds.start (1),
    dds.participant.join (2), dds.endpoint.new (3) ✓

Known non-blocking rough edge: the displayed receive rate uses an EMA
on instantaneous samples/dt and can spike for short bursts. Actual
emit cap is respected; only the cosmetic field is noisy. Will polish
in a later commit alongside the vision pipeline.
Wires the two new tools into DevDuck's default tool set so they're
available to the agent out-of-the-box on any host that has the
cyclonedds Python package installed (our target: Thor and similar
Jetson/Linux robot gateways).

Changes:
  - devduck/__init__.py: add 'dds_peer' and 'use_ros' to both default
    DEVDUCK_TOOLS strings (--mcp mode and standard mode).
  - docs/research/ros2-native-integration.md: mark commits 1-7 as
    shipped, keep 8-11 as the remaining deep-vision / AWS / Google /
    use_agents routing work.

No server-side hook is needed: dds_peer doesn't expose a socket
server like tcp/ws/mcp. Instead, use_ros auto-calls dds_peer._start
on first use (lazy bring-up), so 'use_ros' is the only surface the
agent needs to see.

Users who want to opt out (e.g. DevDuck running on a laptop without
cyclonedds installed) can set DEVDUCK_TOOLS to any custom string —
the tool registry is additive only when included, and dds_peer's
start() reports a clean 'cyclonedds not installed' error without
crashing the process.
Complementary to the deep research doc, this is the short 'how to use
it today' guide: quickstart commands, architecture diagram of the
shipping pieces, the live verification log from Thor, and the
remaining roadmap.

Placed at docs/ros2-native-integration.md (alongside the deeper
research doc at docs/research/ros2-native-integration.md) so that
the end-user README stays short while the design rationale stays
discoverable.
…rfaces/AddTwoInts

Extends the bundled IDL registry from 28 -> 35 ROS2 types:

rcl_interfaces messages (present on every live ROS2 node):
  - ParameterType, ParameterValue, Parameter
  - ParameterEvent   (wire type for /parameter_events)
  - Log              (wire type for /rosout)

example_interfaces service types (enables use_ros 'call' action):
  - AddTwoInts_Request
  - AddTwoInts_Response

Service types carry the rmw_cyclonedds_cpp correlation header that
wraps every ROS2 service request/reply on the wire:

    typedef struct cdds_request_header {
        uint64_t guid;   // lower 64 bits of client writer GUID
        int64_t  seq;    // monotonic per-client sequence number
    } cdds_request_header_t;

Header layout verified against real ROS2 Jazzy traffic (confirmed
via rmw_cyclonedds_cpp/src/serdata.hpp and on-wire observation of
demo_nodes_cpp add_two_ints_server).

Wire format for rmw_fastrtps differs slightly and is not covered
by this commit.
…invocation

Adds two new actions to the agent-facing ROS2 tool:

  list_services   — enumerate discovered ROS2 services by joining
                    rq/*Request + rr/*Reply topic pairs observed via
                    SEDP, with pretty pkg/srv/Name hints for 'call'.
  call            — invoke a ROS2 service and wait for its reply,
                    with automatic client_guid + sequence_number
                    header handling and per-call reply correlation.

Implementation details:
  - A random 64-bit client GUID is drawn once per process.
  - A monotonic sequence number identifies each in-flight request.
  - The response reader drains stale samples before firing so a
    previous client's replies can't match our correlation header.
  - Reply matching requires (client_guid, sequence_number) to match
    exactly; other clients' traffic is ignored.

This is the rmw_cyclonedds_cpp service wire format; rmw_fastrtps
uses a different wrapping and is not supported by this commit.
Error message on timeout explicitly calls this out so agents can
route to the right retry path.

Usage:
    use_ros(action='list_services')
    use_ros(action='call',
            service='/add_two_ints',
            srv_type='example_interfaces/srv/AddTwoInts',
            msg={'a': 17, 'b': 25},
            timeout=5.0)
Adds tests/integration/test_ros2_interop.py — a self-contained
Docker-based integration test that validates DevDuck's DDS + ROS2
tool stack against real ROS2 Jazzy traffic from demo_nodes_cpp.

The harness:
  1. Launches a ros:jazzy container with RMW_IMPLEMENTATION=
     rmw_cyclonedds_cpp
  2. Installs demo_nodes_cpp (/chatter publisher + /add_two_ints
     service server) and example_interfaces
  3. Builds the cyclonedds Python wheel against the system lib so
     our pure-Python code speaks the same protocol as ROS2
  4. Copies devduck/tools/{_ros_msgs,dds_peer,use_ros}.py into the
     container and exercises them in-process

Verifies 19 assertions across all major features:
  - dds_peer participant + endpoint discovery via SPDP/SEDP
  - use_ros list_topics with rt/* -> /* mapping
  - echo /chatter against a real ROS2 C++ publisher
  - tail /chatter streaming into the event_bus
  - pub/echo round-trip on a fresh topic
  - /rosout decoding using the new rcl_interfaces/Log type
  - /parameter_events type registry lookup
  - list_services enumerating rq/rr pairs from live discovery
  - call /add_two_ints asserting 17+25=42 end-to-end

Runs on macOS (Docker Desktop) or Linux; on macOS all DDS traffic
stays inside the single container because Docker Desktop does not
bridge host-level multicast into containers (documented corner).

First run ~2 min (image pull + cyclonedds wheel build); subsequent
runs ~30 s thanks to apt + pip wheel caching in the layer.

Run:
    python3 tests/integration/test_ros2_interop.py
@cagataycali

Copy link
Copy Markdown
Owner Author

Status update — real ROS2 validation, ready to review

Three new commits pushed on top of the draft (576506a, e68ac60, 454a4e5):

  • feat(_ros_msgs) — 28 → 35 bundled IDL types: added rcl_interfaces (Log, ParameterEvent, Parameter, ParameterValue, ParameterType) so /rosout and /parameter_events stop showing (topic, unknown), plus example_interfaces/AddTwoInts_Request/_Response with the correct cdds_request_header_t layout (uint64 guid + int64 seq, 16 bytes — confirmed by reading rmw_cyclonedds_cpp/src/serdata.hpp and by live wire observation).
  • feat(use_ros) — new call + list_services actions. call generates a per-process random 64-bit client GUID, bumps a monotonic sequence number per call, drains stale replies, fires the request on rq/...Request, and waits on rr/...Reply for the sample whose (client_guid, sequence_number) matches. Mismatched replies from other clients on the same topic are ignored.
  • test(integration) — fully automated Docker-based E2E harness at tests/integration/test_ros2_interop.py. Pulls ros:jazzy, installs demo_nodes_cpp + example_interfaces + rmw_cyclonedds_cpp, builds the cyclonedds Python wheel against the system lib, copies our three tool sources in, and exercises them in-process against a live demo_nodes_cpp talker + add_two_ints_server.

Validation against real ROS2 Jazzy

One command:

python3 tests/integration/test_ros2_interop.py

19/19 assertions pass, including:

  • Discovery of multiple real DDS participants (talker + add_two_ints_server + our test process)
  • Decoding real std_msgs/String bytes from a C++ publisher
  • tail streaming into event_bus at rate cap
  • Round-trip pub → echo on a fresh topic
  • /rosout rcl_interfaces/msg/Log decoding with real talker log lines
  • list_services enumerates all 15 discovered services from rq/rr pairs
  • call /add_two_ints {a: 17, b: 25} → {sum: 42} from a real C++ server

Superseding #7

Relation to #7 clarified there; closing #7 in its favor. #7 covered the skeleton (lifecycle + builtin reader discovery + a single std_msgs/String IDL) but explicitly punted real ROS2 interop to "phase 2 input". This PR validates against real ROS2 and extends the surface to call, tail, and 35 message types.

Lessons learned while validating (documented in the code)

  • The rmw_cyclonedds_cpp service wire header is {uint64 guid, int64 seq} (16 B), not {uint64, uint64, int64} (24 B). Initial guess failed silently because the server couldn't align our payload. Fixed + documented inline in _ros_msgs.py.
  • docker logs emits ROS2 stderr on its stderr channel, not stdout. Harness polls both.
  • Docker Desktop on macOS can't bridge host multicast into containers. Harness runs everything inside a single container namespace so this works on Mac, Linux host-networking, and CI identically.

Moving out of draft. Ready for review.

- Rewrite use_ros with rclpy + dynamic type resolution (future-proof via getattr)
- Native + Docker dual backend with auto-detection
- Remove legacy DDS-based use_ros + _ros_msgs registry
- Remove test_ros2_interop.py (specific to legacy DDS architecture)
- Add use_mavlink: universal MAVLink drone control via pymavlink
  - Supports ArduPilot/PX4/any MAVLink vehicle
  - High-level: arm/disarm/takeoff/land/rtl/set_mode/goto/velocity
  - Introspection: list_messages/get_message/stream
  - Raw send via getattr dispatch (same pattern as use_ros)
- Keep dds_peer as standalone raw-DDS tool (useful outside ROS2)
- Register use_ros + use_mavlink in default DEVDUCK_TOOLS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants