Re: [PATCH] RFC: Python-based device-tree validation

Rob Herring <robh@xxxxxxxxxx> · Wed, 8 May 2019 18:58:25 -0500

On Mon, Apr 29, 2019 at 5:34 PM Simon Glass <sjg@xxxxxxxxxxxx> wrote:
>
> This is an attempt to illustrate an alternative method of validating
> device-tree schema, based on pylibfdt and Python-based schema files.
>
> Two tools are included:
>
>    validate_dts.py  - validates a .dts file based on schema files it
>                         finds in the kernel
>    binding_to_py.py - create Python schema files from an individual
>                         .txt binding file
>
> Both tools are just at the proof-of-concept stage. See the instructions
> at the top of each for usage.
>
> This DT validator came out of work in Chrome OS and was proven successful
> in providing a simple validator which is extremely flexible (e.g. custom
> code is easily added to check phandle targets and other non-trivial rules)
> and easy to use. It provides simple error messages when it finds a
> problem, and works directly from the compiled .dtb file. It is also quite
> fast even in the face of large schemas and large DT files.
>
> Schema is handled by Python files in Documentation/devicetree/binding,
> mirroring the existing .txt files. The above tool makes an attempt to
> convert .txt to .py, but it is very basic at present.
>
> To try this you also need the kernel patch:
>
>    RFC: Example schema files written in Python

I also replied to this, so I won't repeat what I said there.

> diff --git a/README.kernel_validate b/README.kernel_validate
> new file mode 100644
> index 0000000..9c0a871
> --- /dev/null
> +++ b/README.kernel_validate
> @@ -0,0 +1,118 @@
> +Kernel Device-Tree Validator based on Python and pylibfdt
> +=========================================================
> +
> +Background
> +----------
> +
> +As part of the discussions at Linux Plumbers 2018 [1] I showed a few people a
> +device-tree validator which uses a Python-based schema. This was developed for
> +Chrome OS to deal with the validation problem there.

I did go look at it then. :)

> +From my someone limited understanding of YAML, JSON and validation in that
> +world [2] it seems to me that it was very difficult to get that technology to
> +validate the kernel DT files successfully: lots of complex tooling, regexs and
> +potentially a need to move to yaml for the source format.

I don't agree with this part. We're already able to do lots of
validation even though we have only converted over a handful of
bindings (getting people to fix the issues is the problem).

Complex tooling? It's 600 or so lines for the library and a couple of
tools at 50-100 lines each. That's less than this patch.

regex's are very helpful and I would say required. For example, it's a
2 line addition to add type checking for all occurrences of
"^.*-supply$".

We only need yaml as an intermediate format. That's already in place in dtc.

>  In addition it was not
> +clear to me that it would be possible to do the sort of deep validation that is
> +desirable with the kernel DT files, for example checking subnodes do not
> +conflict, handling phandles which link nodes in both directions. Some of this
> +has in fact been handled by mnodifying dtc, which seems like a better approach.
> +But it has its limits, without deep knowledge of the schema.

This I do agree with, but I don't see an example of this problem
solved in your RFC (maybe I missed it). As I mentioned in the other
patch, I think extending the dtc checks is the way forward. We need to
be able to extend the checks easily. A sort of checks plugin which
supports at least C and python language bindings would be awesome. You
probably know how to do that better than me.

> +
> +So I put together this proof-of-concept to show another approach, to seek
> +comments and see what people think.
> +
> +Bad things:
> +- It's Python
> +- It requires a new input format for the schema
> +- There would be a lot of work needed to turn this into production code
> +- Many others that I suspect you are about to point out
> +
> +Good things:
> +- It is quite easy to write the schema in Python
> +- The validation rules are easy to write and understand, even complex ones
> +- Highly complex validation rules (e.g. covering multiple linked node) are
> +    possible to write, offering a very powerful validation framework
> +- It is fast and works on the .dtb file directly

Fast is good, but I don't think working on dtb's is an advantage.
Using the dts and maintaining the bracketing (<>) in the yaml output
lets us do better type checking. I guess that is a work-around for not
parsing #.*-cells at least for some of the cases, and it does mean we
have to be stricter on the source format. The latter is not a bad
thing though IMO.

Rob