json output and caching#191
Conversation
The human-readable output of dt-validate makes it inconvenient to integrate DeviceTree validation in tools and automatic workflows. Introduce support for producing JSON formatted output and allow the user to select this through the new --format argument. The default output of text mode is retained. The generated JSON output is an array of objects containing the human readable "formatted" member, in addition to the various components presented in structured form across the "file", "node", "nodename", "schema", "message", and a few additional presentable insights into the error/warning. This allows build tooling, editors, CI systems, and other consumers to process dt-validate results directly while preserving the current output for existing users. Add coverage for the diagnostic serialization helpers and a CLI-level test which verifies that --format json produces parseable JSON. Assisted-by: Codex:gpt-5.3-codex Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
dt-validate formats diagnostics with absolute DTB paths. This makes the output unnecessarily noisy in the typical case of it being run from the root of a source tree, such as the Linux kernel, but perhaps more importantly, it prevents/complicates comparison of validation results between different source trees. Report DTB paths relative to the current working directory when the validated file is below it, while preserving absolute paths for files outside the current tree. Apply the same display path to text diagnostics and JSON diagnostics. This keeps diagnostics useful for files outside the tree, but makes in-tree validation output stable across different checkout locations. Assisted-by: Codex:gpt-5.3-codex Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
In the workflow of applying series of DeviceTree source patches, maintainers often validate a set of DTBs repeatedly against the same processed schema. Most DTBs are unchanged between commits, but dt-validate still re-runs the full schema validation for each one - unless such efforts are prevented by external tooling/wrappers. Introduce an optional cache mechanism, controlled through the --cache-dir argument, which caches the validation output per DTB. The cache is keyed on dtschema version, schema content hash, DTB content hash, and validation options which affect the output. The cached diagnostics is stored in structured form, allowing it to be retreived in either text or json format. Cache entried are not keyed on DTB filename, making it possible to share cache across multiple source trees. On a modest test machine, it's possible to revalidate (hot cache) the almost 400 Qualcomm Arm64 DTBs in about 0.3 seconds, eliminating the need for external tricks to avoid validation work. Assisted-by: Codex:gpt-5.3-codex Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
|
This all looks great! Do you have patches for the kernel to enable caching? One issue is adding a cmd line option then ups the version requirement in the kernel. Can we enable caching by default? Also, we'd need to make sure a 'make clean' clears the cache. |
Thank you
Not yet, for the kernel build I've only triggered it explicitly through the make flags. The main use case for me is to replace the various wrapper scripts I have in my maintainer workflow, where I validate patches on a rc-based branch against processed-schema.json from latest linux-next, and then compare the validation results before and after each commit in my tree...
I think the default behavior when run within e.g. the kernel should be to use something like $KBUILD_OUTPUT/.dt-cache, but blindly making that the default doesn't sound like the correct choice. In particular since I think we want to avoid doing cache management in dt-validate (i.e. tying it to make clean in the kernel is much cleaner).
How about we allow caching to be enabled either with --cache-dir or a DT_VALIDATE_CACHE_DIR environment variable? We can then introduce a change in the kernel that sets this and implements the "clean" operation, this should work seamlessly in both directions. |
We already do some checking if the processed schema is out of date just checking the version. But we don't clean, just error out and it's the caller's problem to clean.
I'd rather not support both in dtschema, but I suppose the kernel could be an env var/makeflag which then causes --cache-dir to be set. Essentially, that is what DT_SCHEMA_FILES does. I am a little reluctant to add another makeflag as we've already got a couple. Though this case we could drop it and bump the dtschema min version at some point. Effectively, we require using the latest version anyways and who knows what using the current min with the current kernel tree does. I'm pretty sure there would be some missing schemas in dtschema that the kernel schemas reference. So I'm not totally opposed to just bumping the min version. I think we should have the min released for say 1 kernel cycle before we do that. |
That's fair.
We effectively already have that, in that DT_CHECKER_FLAGS='--cache-dir .dt-cache' triggers the caching behavior already with this proposed implementation. We could leave it at that for now and next time we bump the min-version we make this the default.
I agree, let's try to keep things simple. My proposal for environment variable was merely to pass the --cache-dir in a form where it would be ignored when the user have an older dt-validate version. |
|
Do we need the empty output?: |
| sg.check_dtb(filename) | ||
|
|
||
| if args.format == 'json': | ||
| print(json.dumps(sg.diagnostics, indent=2)) |
There was a problem hiding this comment.
Doesn't this need to go to stderr? If I do a quiet build, I don't get anything.
However, if there are any other warnings/errors in the build, then they are going to corrupt the json. You need to be able to output the json to a file. Perhaps it shouldn't be either text or json output, but always text output with optional json output to a file.
In the DeviceTree source maintainer workflow, dt-validate is run repeatedly to catch regressions as well as improvements. This typically result in many similar validation passes against the same processed schema, with the need for structured output that can be compared between the runs.
Introduce the ability for dt-validate to produce json-formatted output of the validation results, and introduce a caching mechanism to avoid reevaluating unmodified files.