Skip to content

SecurityRonin/dar-forensic

Repository files navigation

dar-forensic

Crates.io docs.rs License: MIT CI Sponsor

Pure-Rust reader for Denis Corbin DAR (Disk ARchiver) archives — the format mobile-forensics tools (Passware Kit Mobile, Cellebrite) use for full-filesystem extractions. Enumerates the catalog, seeks straight to any file for random-access extraction — transparently decompressing gzip, bzip2, xz, zstd, lz4 and lzo, and reading multi-volume (sliced) archives — and is hardened to be pointed safely at untrusted evidence. Zero unsafe, no GPL, no C bindings.

Two crates

Crate Role crates.io
dar-core read-only parser — open, enumerate, seek-extract, CRC-verify cargo add dar-core
dar-forensic forensic-grade reader + anomaly auditor (audit() → graded findings, write_bodyfile()) cargo add dar-forensic

dar-forensic re-exports the full dar-core reader, so the analyzer crate alone is enough for forensic work:

[dependencies]
dar-forensic = "0.7"

Quick start

use std::fs::File;
use dar_forensic::DarReader;

// `open` takes anything Read + Seek — a File, or a Cursor over bytes.
let mut reader = DarReader::open(File::open("userdata.1.dar")?)?;

for entry in reader.entries() {
    println!("{} ({} bytes)", entry.path_lossy(), entry.size);
}

// Extract one file — a direct seek to its catalog offset, no scanning.
let data = reader.extract("root/etc/hostname")?;
println!("{}", String::from_utf8_lossy(&data));

// Integrity check — recompute the stored per-file CRC over the data.
println!("{}", reader.verify("root/etc/hostname")?); // CRC match | CRC mismatch: …

// Forensic audit — flag catalogue anomalies (metadata only, no data read).
for finding in reader.audit() {
    // e.g. [MEDIUM] DAR-PATH-TRAVERSAL: entry `../../etc/cron.d/x` contains a `..` …
    eprintln!("{finding}");
}

// Timeline export — write a Sleuth Kit bodyfile straight into `mactime`.
reader.write_bodyfile(&mut std::io::stdout())?;
# Ok::<(), dar_forensic::DarError>(())

What makes this different

DAR is a C++ format; the reference implementation (libdar) is GPL with C bindings, and the dar name on crates.io is an empty placeholder. dar-forensic is the first standalone, dependency-light Rust reader — and it is built for forensic use, where the archive is evidence from a potentially hostile source:

libdar (C++) dar-forensic
Language / linkage C++, GPL, C FFI pure Rust, MIT, unsafe_code = "deny"
Reads DAR formats 1–11 ✅ (1 + 7–11 validated against real archives)
Tape-marks-disabled archives (Passware / mobile)
Random-access extraction (Read + Seek) ✅ — composes with ewf, vmdk, …
Transparent gzip / bzip2 / xz / zstd / lz4 / lzo decompression ✅ — pure-Rust decoders, no C
Multi-volume (sliced) archives ✅ — open_slices(); file data spans slices transparently
Tail-scan for 90+ GiB archives (≈107 MiB read, not 99 GiB)
Forensic anomaly audit (audit() → severity-graded findings) ✅ — incomplete catalogue, path-traversal, absolute path, … (serde-exportable)
Timeline export (Sleuth Kit bodyfile → mactime) ✅ — write_bodyfile() straight from the catalogue
Hardened against malicious input (no panic / OOM / backward seek)
Continuous fuzzing cargo fuzz
100% line coverage, CI-enforced

Note on the "Passware variant"

Archives written by Passware Kit Mobile have no seqt_catalogue escape, which once looked like a vendor-specific format. It isn't: the escape is an optional sequential-read tape mark, and Passware simply writes archives with tape marks disabled (equivalent to dar -at). They are standard DAR — official dar reads them too. dar-forensic locates the catalog by its ref_data_name label in that case (a real structural field, the same 10 bytes as the slice label), so it reads both tape-marked and tape-mark-free archives.

Anomaly codes

audit() reads the catalogue only (no entry data) and returns severity-graded Anomaly values, most-severe first. Each carries a stable, machine-readable code (a published contract), a severity, and a human-readable note. Findings are observations, not verdicts — the analyst draws the conclusion.

code Severity What it flags
DAR-CATALOG-INCOMPLETE High Catalogue ended early — fewer entries recovered than the archive claims (truncation or corruption)
DAR-PATH-ABSOLUTE Medium Entry path begins with / — extraction outside the intended root
DAR-PATH-TRAVERSAL Medium Entry path contains a .. component — directory-traversal on extraction
DAR-PATH-DUPLICATE Low The same path appears more than once in the catalogue
DAR-TIME-FUTURE Low An atime/mtime/ctime is far in the future — possible timestamp tampering
DAR-NAME-CONTROL Low Entry name contains control characters (< 0x20 or 0x7f) — terminal-injection / concealment

With the serde feature, Anomaly is Serialize for JSON/structured export.

Format support

DAR format version_string Status
Format 11 (dar 2.7–2.8) "0;3" (11.3) Supported — validated against a dar 2.8.5 fixture
Format 10 (dar 2.6) "0:1" Supported — validated against a dar 2.6.16 fixture
Format 9 (dar 2.5) "090" Supported — validated against a dar 2.5.3 fixture and a real 92 GiB Passware archive
Format 8 (dar 2.4) "081" Supported — validated against a dar 2.4.24 fixture
Format 7 (dar 2.3) "07" Supported — validated against a dar 2.3.12 fixture
Formats 2–6 (dar 2.0–2.3) "02""06" Same legacy grammar as 7; parsed but not yet validated against a fixture
Format 1 (dar 1.0.x) "01" Supported — validated against a real dar 1.0.0 archive (flagless inode, size·offset cat_file, no CRC)
Tape marks on or off both supported (e.g. Passware writes them off)
Archive creation / writing Not supported (reader only)

The format version is the header version_string, each byte value + 48 ("090" → 9, "0:1" → 10.1). Formats ≤ 7 are structurally different — no seqt_catalogue escape (catalog located via the end terminateur trailer), u16 uid/gid, bare-seconds timestamps, and a fixed 2-byte CRC; format 1 goes further still — no inode flag byte, and a size·offset-only file record with no CRC. Compressed pre-8 archives carry no per-entry codec byte, so the archive-global codec drives both the catalog and every entry. The full per-version layout, reverse-documented from the authoritative libdar source, is in docs/implementation-notes.md §11–§12.

Scope and limits

  • Read-only — does not create or modify archives.
  • Decompression: gzip, bzip2, xz, zstd, lz4, lzo — all six are transparently inflated for both the compressed catalog and extracted entry data (pure-Rust decoders, bounded against decompression bombs), in both dar's single-stream and per-block (block_compressor) modes. Encrypted entries are listed but extract() returns a clear error rather than wrong bytes — decryption is out of scope.
  • All codecs always compiled — a forensic reader must read every variant it encounters, so the six decompression codecs are not optional Cargo features. The only optional feature is serde (structured audit() export).
  • CRC verificationverify(path) recomputes libdar's per-file CRC over the decompressed data and compares it to the value stored in the catalogue, returning Match, Mismatch { stored, computed }, or NotStored (edition-1 archives record no CRC). It never withholds the bytes: data that fails its CRC can still be extracted for analysis of the corruption.

Security

dar-forensic is designed to be run on archives from potentially compromised or adversarial sources:

  • No panics on malicious input — every attacker-controlled length and offset is bounds- or overflow-checked.
  • No allocation bombs — a forged stored_size is validated against the real archive length before any allocation.
  • No backward seeks — a length that would cast to a negative i64 seek is rejected.
  • Bounded decoding — infinints are u64-or-Corrupt (never silently truncated); NUL-terminated names are length-capped; the terminateur scan is bounded.
  • Zero unsafe and continuously fuzz-tested.

Running the fuzz target

rustup install nightly
cargo install cargo-fuzz
# three targets: the parser (fuzz_open), full read+extract (fuzz_read),
# and the audit pipeline (fuzz_forensic)
cargo +nightly fuzz run fuzz_open

Testing

187 tests — unit (private helpers + every error branch), synthetic-archive integration, and real-fixture integration — at 100% library line coverage, enforced in CI (cargo llvm-cov, lcov gate), with a second gate that holds the public-API (tests/) suite to the same bar. Committed, reproducible fixtures cover formats 7–11 (one per dar release), all six dar -z codecs (gzip/bzip2/xz/zstd/lz4/lzo), and per-block and multi-volume (sliced) archives. Parsing was additionally validated byte-for-byte against a real dar-1.0.0 edition-1 archive, a confidential 92 GiB Passware Kit Mobile archive (format 9, 637,698 entries), and a real 52 GB Android extraction re-sliced into 13 volumes with dar_xform (302,401 entries; every extraction byte-identical to the single-file reader) — none committed. That last, real archive caught two bugs no synthetic fixture could (see docs/implementation-notes.md). The parser survives millions of cargo fuzz executions with zero crashes.

cargo test
cargo install cargo-llvm-cov && cargo llvm-cov --lcov --output-path lcov.info

The --summary-only line percentage can read slightly under 100% because the generic, reader-agnostic functions are monomorphized once per reader type across the test binaries; the lcov merge (and --show-missing-lines) confirms no source line is left uncovered.

Related crates

dar-forensic reads the files inside a DAR archive. When the archive itself is wrapped in a disk-image container, these crates provide the same Read + Seek interface to feed it:

Crate Format
ewf E01 / Expert Witness Format (EnCase, FTK Imager)
aff4 AFF4 v1 (Evimetry)
vmdk VMware VMDK
vhdx Microsoft VHDX (Hyper-V, Azure)
vhd Legacy VHD
qcow2 QEMU / KVM QCOW2
ufed Cellebrite UFED
dd Raw / flat / dd images
iso9660-forensic ISO 9660 optical media
dmg Apple DMG / UDIF

For forensic integrity analysis of container formats:

Crate Format
ewf-forensic E01 structural audit, Adler-32 / MD5 repair
vhdx-forensic VHDX integrity analysis

Privacy Policy · Terms of Service · © 2026 Security Ronin Ltd

About

Pure-Rust forensic reader + anomaly auditor for Denis Corbin DAR (Disk ARchiver) archives, incl. Passware Kit Mobile / Cellebrite mobile extractions; formats 1-11, transparent gzip/bzip2/xz/zstd/lz4/lzo, multi-volume, hardened and fuzz-tested. dar-core reader + dar-forensic analyzer.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages