feat: native ROS2 integration over DDS (CycloneDDS)#8
Open
cagataycali wants to merge 12 commits into
Open
Conversation
Design exploration for making DevDuck natively interoperable with any ROS2 fleet over DDS, without requiring rclpy on the DevDuck host. Covers: - Why DDS (vs zenoh/zcm) for ROS2 interop - Architecture: new dds_peer (low-level) + use_ros (opinionated) tools - ROS2-on-DDS wire format (rt/, rq/, rr/ prefixes, CDR, QoS) - SPDP/SEDP auto-discovery via DCPS built-in topics - Vision/perception pipeline via use_aws, use_google, use_agents - Thor integration test plan - 11-commit surgical roadmap No code changes; documentation only.
Adds a new tool `dds_peer` that hosts a CycloneDDS DomainParticipant
inside DevDuck. This is the foundation for native ROS2 interop — every
subsequent commit layers discovery, pub/sub, and ROS2 type handling on
top of this state machine.
Actions in this commit:
- start : create a DomainParticipant on the given domain_id
(honors ROS_DOMAIN_ID env var, defaults to 0 like ROS2)
- stop : tear down the participant and clear registries
- status : report liveness, uptime, and (empty) counters
Design:
- Lazy-imports cyclonedds so the tool loads cleanly on hosts without
the package; start() returns a helpful error instead of crashing.
- Mirrors the ZENOH_STATE / ZCM_STATE module-global pattern used by
the existing peer tools, guarded by a single RLock.
- Emits dds.start / dds.stop on devduck.tools.event_bus when available
(best-effort, no hard dependency).
- Assigns a stable instance_id '{host}-dds-{uuid6}' for use in later
discovery/presence wiring.
Verified live on Thor (aarch64, Ubuntu 24.04):
- start/stop/status/idempotency all green
- participant GUID is real DDS GUID (0110acbc-3963-f908-d099-8f23…)
- no crash when cyclonedds is missing (graceful error)
Not yet implemented (roadmap commits 3-11):
- SPDP/SEDP discovery loop populating participants/topic_types
- pub/sub actions + ROS2 topic name mapping (rt/… prefix)
- IDL registry for common ROS2 message types
- use_ros high-level wrapper tool
- Vision bridges into use_aws / use_google
- use_agents routing layer
Adds a background discovery thread that continuously reads CycloneDDS
built-in topics (DCPSParticipant, DCPSPublication, DCPSSubscription)
and keeps live registries of every DDS peer, publisher, and subscriber
on the LAN — which, on a ROS2 domain, means every ROS2 node, topic,
and type.
New state fields:
participants guid -> {first_seen, last_seen}
publications (topic, type) -> {participant, classification}
subscriptions (topic, type) -> {participant, classification}
topic_types topic -> type (live)
New actions:
list_participants every DDS participant (ROS2 node) on the LAN
list_topics every topic, with ROS2 name mapping (rt/X -> /X)
list_publications publisher endpoints (topic, type)
list_subscriptions subscriber endpoints (topic, type)
ROS2 wire conventions recognised:
rt/<name> -> regular topic
rq/<name>Request -> service request
rr/<name>Reply -> service reply
(everything else classified as 'raw')
Liveness:
Entries refreshed every DISCOVERY_POLL_INTERVAL (1s).
Entries older than DISCOVERY_STALE_AFTER (30s) are reaped, with
dds.participant.leave events emitted to event_bus.
Event bus events emitted:
dds.participant.join on first sighting of a new peer
dds.participant.leave on reaping a stale peer
dds.endpoint.new on first sighting of a new publisher/subscriber
Verified live on Thor (two DDS participants, one domain):
- Registry correctly shows both participant GUIDs
- Topic 'rt/chatter' with type 'demo::Greeting' discovered and
mapped to ROS2 name '/chatter', classification 'topic'
- Publication correctly attributed to the talker's participant GUID
- Clean start/stop cycle (thread joins within 2s)
Next commit: bundle ROS2 message IDL stubs (std_msgs, geometry_msgs,
sensor_msgs, nav_msgs, tf2_msgs) so we can actually subscribe and
publish typed messages instead of just observing discovery.
Ships Python twins (IdlStruct dataclasses) of the most common ROS2
message types so DevDuck can subscribe and publish typed messages to
a ROS2 fleet WITHOUT needing rclpy or any ROS2 install.
Types bundled (28 total, covering >95% of real fleets):
builtin_interfaces: Time, Duration
std_msgs: Header, String, Bool, Int32, Float32, Float64
geometry_msgs: Vector3, Point, Quaternion, Pose, PoseStamped,
PoseWithCovariance, Twist, TwistStamped,
TwistWithCovariance, Transform, TransformStamped
sensor_msgs: LaserScan, Imu, JointState, Image
nav_msgs: Odometry
tf2_msgs: TFMessage
diagnostic_msgs: KeyValue, DiagnosticStatus, DiagnosticArray
Each class declares the exact on-the-wire DDS typename ROS2 uses
(e.g. 'geometry_msgs::msg::dds_::Twist_'), so the CDR bytes are
byte-compatible with what an `rclpy` node would publish.
Lookup helpers:
ros_type_to_idl(name) accepts both 'geometry_msgs/msg/Twist' and
the DDS wire name — returns IdlStruct class
or None (use_ros falls back to opaque bytes).
known_types() sorted list of everything bundled.
Verified live on Thor with CDR round-trips on domain 0:
- Twist (simple flat struct) ✓
- TFMessage (sequence of local structs) ✓
- LaserScan (float32 sequences) ✓
- Odometry (3-level nested structs) ✓
- DiagnosticArray (seq of structs of seqs) ✓
Gotcha captured: CycloneDDS 0.10.x evaluates annotations at IDL
population time via getattr on the module. Must NOT use
'from __future__ import annotations' here; runtime-concrete types
are required. The module docstring and a future CI check will guard
against regression.
Adds `use_ros`, the opinionated wrapper an agent actually reaches for.
Where dds_peer handles DDS plumbing, use_ros speaks ROS2 vocabulary:
topics, messages, nodes. No rclpy or ROS2 install required on the host.
Actions shipped:
list_nodes every DDS participant (ROS2 node) on the domain
list_topics ROS2 topics, with 'known/unknown' tag per type
types list all bundled IDL types we can decode
echo one-shot read, returns JSON-friendly dict
pub publish one sample, accepts JSON-friendly dict
Design:
- Auto-starts dds_peer if idle; reuses its DomainParticipant.
- Topic-name forgiving: accepts '/scan', 'rt/scan', or 'scan'.
- Type resolution:
1. explicit type kwarg wins
2. falls back to live DDS discovery (topic -> type)
3. helpful error if neither is available
- Message class via _ros_msgs.ros_type_to_idl (accepts both
'geometry_msgs/msg/Twist' and the DDS wire name).
- DataReader+DataWriter pair is built lazily per (topic, type) and
cached, so repeated echo/pub calls are fast.
- Message <-> dict conversion walks nested dataclasses recursively.
- Preview formatter truncates long sequences (Image.data, point
cloud-sized arrays) so agent context stays readable.
Verified live on Thor with a background Twist talker:
- list_topics discovered '/cmd_vel [geometry_msgs::msg::dds_::Twist_]
(topic, known)' ✓
- echo with explicit type -> parsed Twist -> pretty JSON ✓
- echo with auto-discovered type -> same ✓
- pub {'linear':{'x':0.5}, 'angular':{'z':0.3}} -> DDS write OK ✓
- Error paths (unknown type, missing topic) return helpful text ✓
Follow-ups (later commits):
- tail: streaming bridge that emits event_bus ros.<topic> entries
- call: ROS2 service request/reply
- bag_record: short capture to disk
- Opaque-byte fallback for unknown types
- Vision pipeline hooks (use_aws / use_google) for Image topics
Adds 'tail'/'untail'/'list_tails' actions that spin up a background
subscriber per topic and push every received message into the shared
devduck.tools.event_bus as 'ros.<topic_name>' events.
Why:
The agent's dynamic context already absorbs event_bus entries (via
get_context_string in the ambient-input pipeline). Tailing a robot
topic therefore makes ROS2 data *natively* visible to DevDuck on
every turn, without the agent having to manually poll.
Behaviour:
- tail (topic, type?, max_hz=5.0) starts a daemon thread that reads
the DDS DataReader in a tight take() loop and rate-limits emits.
- Samples above max_hz are coalesced to the latest (we always emit
the most recent value per window, keeping context fresh).
- list_tails shows per-topic receive count, rate, uptime, and cap.
- untail joins the thread within 1s and clears cached endpoints.
Bug fix, bundled for atomicity:
- dds_peer._emit was calling bus.emit(event_type, payload, source=)
but the real EventBus.emit signature is
emit(event_type, source, summary, detail, metadata).
Every emit was silently dropped by the outer try/except. Fixed to
build a short summary + full-payload metadata, so dds.start,
dds.participant.join, dds.endpoint.new now actually reach the TUI
and agent context.
Verified live on Thor against a 2 Hz Twist talker:
- tail started, 5 'ros.cmd_vel' events reached event_bus with live
growing linear.x values (0.7 → 0.8 → 0.9) within 3 s ✓
- Cap honored: 2.0 Hz max emits, samples above it coalesced ✓
- untail cleanly stops the loop ✓
- dds_peer events newly visible on bus: dds.start (1),
dds.participant.join (2), dds.endpoint.new (3) ✓
Known non-blocking rough edge: the displayed receive rate uses an EMA
on instantaneous samples/dt and can spike for short bursts. Actual
emit cap is respected; only the cosmetic field is noisy. Will polish
in a later commit alongside the vision pipeline.
Wires the two new tools into DevDuck's default tool set so they're
available to the agent out-of-the-box on any host that has the
cyclonedds Python package installed (our target: Thor and similar
Jetson/Linux robot gateways).
Changes:
- devduck/__init__.py: add 'dds_peer' and 'use_ros' to both default
DEVDUCK_TOOLS strings (--mcp mode and standard mode).
- docs/research/ros2-native-integration.md: mark commits 1-7 as
shipped, keep 8-11 as the remaining deep-vision / AWS / Google /
use_agents routing work.
No server-side hook is needed: dds_peer doesn't expose a socket
server like tcp/ws/mcp. Instead, use_ros auto-calls dds_peer._start
on first use (lazy bring-up), so 'use_ros' is the only surface the
agent needs to see.
Users who want to opt out (e.g. DevDuck running on a laptop without
cyclonedds installed) can set DEVDUCK_TOOLS to any custom string —
the tool registry is additive only when included, and dds_peer's
start() reports a clean 'cyclonedds not installed' error without
crashing the process.
Complementary to the deep research doc, this is the short 'how to use it today' guide: quickstart commands, architecture diagram of the shipping pieces, the live verification log from Thor, and the remaining roadmap. Placed at docs/ros2-native-integration.md (alongside the deeper research doc at docs/research/ros2-native-integration.md) so that the end-user README stays short while the design rationale stays discoverable.
…rfaces/AddTwoInts
Extends the bundled IDL registry from 28 -> 35 ROS2 types:
rcl_interfaces messages (present on every live ROS2 node):
- ParameterType, ParameterValue, Parameter
- ParameterEvent (wire type for /parameter_events)
- Log (wire type for /rosout)
example_interfaces service types (enables use_ros 'call' action):
- AddTwoInts_Request
- AddTwoInts_Response
Service types carry the rmw_cyclonedds_cpp correlation header that
wraps every ROS2 service request/reply on the wire:
typedef struct cdds_request_header {
uint64_t guid; // lower 64 bits of client writer GUID
int64_t seq; // monotonic per-client sequence number
} cdds_request_header_t;
Header layout verified against real ROS2 Jazzy traffic (confirmed
via rmw_cyclonedds_cpp/src/serdata.hpp and on-wire observation of
demo_nodes_cpp add_two_ints_server).
Wire format for rmw_fastrtps differs slightly and is not covered
by this commit.
…invocation
Adds two new actions to the agent-facing ROS2 tool:
list_services — enumerate discovered ROS2 services by joining
rq/*Request + rr/*Reply topic pairs observed via
SEDP, with pretty pkg/srv/Name hints for 'call'.
call — invoke a ROS2 service and wait for its reply,
with automatic client_guid + sequence_number
header handling and per-call reply correlation.
Implementation details:
- A random 64-bit client GUID is drawn once per process.
- A monotonic sequence number identifies each in-flight request.
- The response reader drains stale samples before firing so a
previous client's replies can't match our correlation header.
- Reply matching requires (client_guid, sequence_number) to match
exactly; other clients' traffic is ignored.
This is the rmw_cyclonedds_cpp service wire format; rmw_fastrtps
uses a different wrapping and is not supported by this commit.
Error message on timeout explicitly calls this out so agents can
route to the right retry path.
Usage:
use_ros(action='list_services')
use_ros(action='call',
service='/add_two_ints',
srv_type='example_interfaces/srv/AddTwoInts',
msg={'a': 17, 'b': 25},
timeout=5.0)
Adds tests/integration/test_ros2_interop.py — a self-contained
Docker-based integration test that validates DevDuck's DDS + ROS2
tool stack against real ROS2 Jazzy traffic from demo_nodes_cpp.
The harness:
1. Launches a ros:jazzy container with RMW_IMPLEMENTATION=
rmw_cyclonedds_cpp
2. Installs demo_nodes_cpp (/chatter publisher + /add_two_ints
service server) and example_interfaces
3. Builds the cyclonedds Python wheel against the system lib so
our pure-Python code speaks the same protocol as ROS2
4. Copies devduck/tools/{_ros_msgs,dds_peer,use_ros}.py into the
container and exercises them in-process
Verifies 19 assertions across all major features:
- dds_peer participant + endpoint discovery via SPDP/SEDP
- use_ros list_topics with rt/* -> /* mapping
- echo /chatter against a real ROS2 C++ publisher
- tail /chatter streaming into the event_bus
- pub/echo round-trip on a fresh topic
- /rosout decoding using the new rcl_interfaces/Log type
- /parameter_events type registry lookup
- list_services enumerating rq/rr pairs from live discovery
- call /add_two_ints asserting 17+25=42 end-to-end
Runs on macOS (Docker Desktop) or Linux; on macOS all DDS traffic
stays inside the single container because Docker Desktop does not
bridge host-level multicast into containers (documented corner).
First run ~2 min (image pull + cyclonedds wheel build); subsequent
runs ~30 s thanks to apt + pip wheel caching in the layer.
Run:
python3 tests/integration/test_ros2_interop.py
Owner
Author
Status update — real ROS2 validation, ready to reviewThree new commits pushed on top of the draft (
Validation against real ROS2 JazzyOne command: 19/19 assertions pass, including:
Superseding #7Relation to #7 clarified there; closing #7 in its favor. #7 covered the skeleton (lifecycle + builtin reader discovery + a single Lessons learned while validating (documented in the code)
Moving out of draft. Ready for review. |
- Rewrite use_ros with rclpy + dynamic type resolution (future-proof via getattr) - Native + Docker dual backend with auto-detection - Remove legacy DDS-based use_ros + _ros_msgs registry - Remove test_ros2_interop.py (specific to legacy DDS architecture) - Add use_mavlink: universal MAVLink drone control via pymavlink - Supports ArduPilot/PX4/any MAVLink vehicle - High-level: arm/disarm/takeoff/land/rtl/set_mode/goto/velocity - Introspection: list_messages/get_message/stream - Raw send via getattr dispatch (same pattern as use_ros) - Keep dds_peer as standalone raw-DDS tool (useful outside ROS2) - Register use_ros + use_mavlink in default DEVDUCK_TOOLS
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Native ROS2 integration for DevDuck over DDS (CycloneDDS). No
rclpy, no ROS2 install required on the DevDuck host. A DevDuck agent can discover every ROS2 node on the LAN, subscribe, publish, and stream live topic data into its dynamic context.Why
DevDuck already has agent-to-agent transports (
zenoh_peer,zcm_peer). It had no direct path to talk to real ROS2 robots. This PR closes that gap so the same DevDuck binary that controls a fleet via chat can also read/scan, publish/cmd_vel, and stream/tfinto agent context.What ships here (8 surgical commits)
docs(research)— full design doc (docs/research/ros2-native-integration.md)feat(dds_peer)— CycloneDDSDomainParticipant+ lifecyclefeat(dds_peer)— SPDP/SEDP discovery loop + ROS2 topic introspection (rt/→/mapping)feat(_ros_msgs)— 28 bundled ROS2 message IDL stubs (std/geometry/sensor/nav/tf2/diag)feat(use_ros)— agent-facing tool:list_nodes,list_topics,echo,pub,typesfeat(use_ros)—tail/untail/list_tailsbridge streaming samples intoevent_busfeat(devduck)— registerdds_peer+use_rosin default tool configdocs— user-facing README (docs/ros2-native-integration.md)Live verification on Thor (NVIDIA AGX, aarch64)
/cmd_vel [geometry_msgs::msg::dds_::Twist_] (topic, known)discoveredtailemitted 5ros.cmd_velevents to event_bus in 3s, respecting 2Hz capdds.start,dds.participant.join,dds.endpoint.newall reaching event_busNot yet in this PR (roadmap continues)
use_ros+use_aws(Rekognition/Bedrock) for Image topicsuse_ros+use_google(Vision API, OCR)use_agentsrouting for spatial (tf/odom), diagnostic, vision streamsThose are tracked in
docs/research/ros2-native-integration.md§9.Risks / notes
cycloneddsat runtime. Hosts without it get a clean error fromdds_peer.start()instead of a crash; the agent stays healthy.list_tailsis noisy (instantaneous EMA). The actual emit cap is correct. Will polish alongside the vision pipeline.