dar-forensic

Pure-Rust reader for Denis Corbin DAR (Disk ARchiver) archives — the format mobile-forensics tools (Passware Kit Mobile, Cellebrite) use for full-filesystem extractions. Enumerates the catalog, seeks straight to any file for random-access extraction — transparently decompressing gzip, bzip2, xz, zstd, lz4 and lzo, and reading multi-volume (sliced) archives — and is hardened to be pointed safely at untrusted evidence. Zero unsafe, no GPL, no C bindings.

Two crates

Crate	Role	crates.io
`dar-core`	read-only parser — open, enumerate, seek-extract, CRC-verify	`cargo add dar-core`
`dar-forensic`	forensic-grade reader + anomaly auditor (`audit()` → graded findings, `write_bodyfile()`)	`cargo add dar-forensic`

dar-forensic re-exports the full dar-core reader, so the analyzer crate alone is enough for forensic work:

[dependencies]
dar-forensic = "0.7"

Quick start

use std::fs::File;
use dar_forensic::DarReader;

// `open` takes anything Read + Seek — a File, or a Cursor over bytes.
let mut reader = DarReader::open(File::open("userdata.1.dar")?)?;

for entry in reader.entries() {
    println!("{} ({} bytes)", entry.path_lossy(), entry.size);
}

// Extract one file — a direct seek to its catalog offset, no scanning.
let data = reader.extract("root/etc/hostname")?;
println!("{}", String::from_utf8_lossy(&data));

// Integrity check — recompute the stored per-file CRC over the data.
println!("{}", reader.verify("root/etc/hostname")?); // CRC match | CRC mismatch: …

// Forensic audit — flag catalogue anomalies (metadata only, no data read).
for finding in reader.audit() {
    // e.g. [MEDIUM] DAR-PATH-TRAVERSAL: entry `../../etc/cron.d/x` contains a `..` …
    eprintln!("{finding}");
}

// Timeline export — write a Sleuth Kit bodyfile straight into `mactime`.
reader.write_bodyfile(&mut std::io::stdout())?;
# Ok::<(), dar_forensic::DarError>(())

What makes this different

DAR is a C++ format; the reference implementation (libdar) is GPL with C bindings, and the dar name on crates.io is an empty placeholder. dar-forensic is the first standalone, dependency-light Rust reader — and it is built for forensic use, where the archive is evidence from a potentially hostile source:

	libdar (C++)	`dar-forensic`
Language / linkage	C++, GPL, C FFI	pure Rust, MIT, `unsafe_code = "deny"`
Reads DAR formats 1–11	✅	✅ (1 + 7–11 validated against real archives)
Tape-marks-disabled archives (Passware / mobile)	✅	✅
Random-access extraction (`Read + Seek`)	✅	✅ — composes with `ewf`, `vmdk`, …
Transparent gzip / bzip2 / xz / zstd / lz4 / lzo decompression	✅	✅ — pure-Rust decoders, no C
Multi-volume (sliced) archives	✅	✅ — `open_slices()`; file data spans slices transparently
Tail-scan for 90+ GiB archives (≈107 MiB read, not 99 GiB)	—	✅
Forensic anomaly audit (`audit()` → severity-graded findings)	—	✅ — incomplete catalogue, path-traversal, absolute path, … (serde-exportable)
Timeline export (Sleuth Kit bodyfile → `mactime`)	—	✅ — `write_bodyfile()` straight from the catalogue
Hardened against malicious input (no panic / OOM / backward seek)	—	✅
Continuous fuzzing	—	✅ `cargo fuzz`
100% line coverage, CI-enforced	—	✅

Note on the "Passware variant"

Archives written by Passware Kit Mobile have no seqt_catalogue escape, which once looked like a vendor-specific format. It isn't: the escape is an optional sequential-read tape mark, and Passware simply writes archives with tape marks disabled (equivalent to dar -at). They are standard DAR — official dar reads them too. dar-forensic locates the catalog by its ref_data_name label in that case (a real structural field, the same 10 bytes as the slice label), so it reads both tape-marked and tape-mark-free archives.

Anomaly codes

audit() reads the catalogue only (no entry data) and returns severity-graded Anomaly values, most-severe first. Each carries a stable, machine-readable code (a published contract), a severity, and a human-readable note. Findings are observations, not verdicts — the analyst draws the conclusion.

`code`	Severity	What it flags
`DAR-CATALOG-INCOMPLETE`	High	Catalogue ended early — fewer entries recovered than the archive claims (truncation or corruption)
`DAR-PATH-ABSOLUTE`	Medium	Entry path begins with `/` — extraction outside the intended root
`DAR-PATH-TRAVERSAL`	Medium	Entry path contains a `..` component — directory-traversal on extraction
`DAR-PATH-DUPLICATE`	Low	The same path appears more than once in the catalogue
`DAR-TIME-FUTURE`	Low	An `atime`/`mtime`/`ctime` is far in the future — possible timestamp tampering
`DAR-NAME-CONTROL`	Low	Entry name contains control characters (`< 0x20` or `0x7f`) — terminal-injection / concealment

With the serde feature, Anomaly is Serialize for JSON/structured export.

Format support

DAR format	`version_string`	Status
Format 11 (dar 2.7–2.8)	`"0;3"` (11.3)	Supported — validated against a dar 2.8.5 fixture
Format 10 (dar 2.6)	`"0:1"`	Supported — validated against a dar 2.6.16 fixture
Format 9 (dar 2.5)	`"090"`	Supported — validated against a dar 2.5.3 fixture and a real 92 GiB Passware archive
Format 8 (dar 2.4)	`"081"`	Supported — validated against a dar 2.4.24 fixture
Format 7 (dar 2.3)	`"07"`	Supported — validated against a dar 2.3.12 fixture
Formats 2–6 (dar 2.0–2.3)	`"02"`–`"06"`	Same legacy grammar as 7; parsed but not yet validated against a fixture
Format 1 (dar 1.0.x)	`"01"`	Supported — validated against a real dar 1.0.0 archive (flagless inode, `size·offset` cat_file, no CRC)
Tape marks on or off	—	both supported (e.g. Passware writes them off)
Archive creation / writing	—	Not supported (reader only)

The format version is the header version_string, each byte value + 48 ("090" → 9, "0:1" → 10.1). Formats ≤ 7 are structurally different — no seqt_catalogue escape (catalog located via the end terminateur trailer), u16 uid/gid, bare-seconds timestamps, and a fixed 2-byte CRC; format 1 goes further still — no inode flag byte, and a size·offset-only file record with no CRC. Compressed pre-8 archives carry no per-entry codec byte, so the archive-global codec drives both the catalog and every entry. The full per-version layout, reverse-documented from the authoritative libdar source, is in docs/implementation-notes.md §11–§12.

Scope and limits

Read-only — does not create or modify archives.
Decompression: gzip, bzip2, xz, zstd, lz4, lzo — all six are transparently inflated for both the compressed catalog and extracted entry data (pure-Rust decoders, bounded against decompression bombs), in both dar's single-stream and per-block (block_compressor) modes. Encrypted entries are listed but extract() returns a clear error rather than wrong bytes — decryption is out of scope.
All codecs always compiled — a forensic reader must read every variant it encounters, so the six decompression codecs are not optional Cargo features. The only optional feature is serde (structured audit() export).
CRC verification — verify(path) recomputes libdar's per-file CRC over the decompressed data and compares it to the value stored in the catalogue, returning Match, Mismatch { stored, computed }, or NotStored (edition-1 archives record no CRC). It never withholds the bytes: data that fails its CRC can still be extracted for analysis of the corruption.

Security

dar-forensic is designed to be run on archives from potentially compromised or adversarial sources:

No panics on malicious input — every attacker-controlled length and offset is bounds- or overflow-checked.
No allocation bombs — a forged stored_size is validated against the real archive length before any allocation.
No backward seeks — a length that would cast to a negative i64 seek is rejected.
Bounded decoding — infinints are u64-or-Corrupt (never silently truncated); NUL-terminated names are length-capped; the terminateur scan is bounded.
Zero unsafe and continuously fuzz-tested.

Running the fuzz target

rustup install nightly
cargo install cargo-fuzz
# three targets: the parser (fuzz_open), full read+extract (fuzz_read),
# and the audit pipeline (fuzz_forensic)
cargo +nightly fuzz run fuzz_open

Testing

187 tests — unit (private helpers + every error branch), synthetic-archive integration, and real-fixture integration — at 100% library line coverage, enforced in CI (cargo llvm-cov, lcov gate), with a second gate that holds the public-API (tests/) suite to the same bar. Committed, reproducible fixtures cover formats 7–11 (one per dar release), all six dar -z codecs (gzip/bzip2/xz/zstd/lz4/lzo), and per-block and multi-volume (sliced) archives. Parsing was additionally validated byte-for-byte against a real dar-1.0.0 edition-1 archive, a confidential 92 GiB Passware Kit Mobile archive (format 9, 637,698 entries), and a real 52 GB Android extraction re-sliced into 13 volumes with dar_xform (302,401 entries; every extraction byte-identical to the single-file reader) — none committed. That last, real archive caught two bugs no synthetic fixture could (see docs/implementation-notes.md). The parser survives millions of cargo fuzz executions with zero crashes.

cargo test
cargo install cargo-llvm-cov && cargo llvm-cov --lcov --output-path lcov.info

The --summary-only line percentage can read slightly under 100% because the generic, reader-agnostic functions are monomorphized once per reader type across the test binaries; the lcov merge (and --show-missing-lines) confirms no source line is left uncovered.

Related crates

dar-forensic reads the files inside a DAR archive. When the archive itself is wrapped in a disk-image container, these crates provide the same Read + Seek interface to feed it:

Crate	Format
`ewf`	E01 / Expert Witness Format (EnCase, FTK Imager)
`aff4`	AFF4 v1 (Evimetry)
`vmdk`	VMware VMDK
`vhdx`	Microsoft VHDX (Hyper-V, Azure)
`vhd`	Legacy VHD
`qcow2`	QEMU / KVM QCOW2
`ufed`	Cellebrite UFED
`dd`	Raw / flat / dd images
`iso9660-forensic`	ISO 9660 optical media
`dmg`	Apple DMG / UDIF

For forensic integrity analysis of container formats:

Crate	Format
`ewf-forensic`	E01 structural audit, Adler-32 / MD5 repair
`vhdx-forensic`	VHDX integrity analysis

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/workflows		.github/workflows
core		core
docs		docs
forensic		forensic
fuzz		fuzz
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
clippy.toml		clippy.toml
deny.toml		deny.toml
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dar-forensic

Two crates

Quick start

What makes this different

Note on the "Passware variant"

Anomaly codes

Format support

Scope and limits

Security

Running the fuzz target

Testing

Related crates

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dar-forensic

Two crates

Quick start

What makes this different

Note on the "Passware variant"

Anomaly codes

Format support

Scope and limits

Security

Running the fuzz target

Testing

Related crates

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages