Re: [PATCH] RFC: Python-based device-tree validation

Simon Glass <sjg@xxxxxxxxxxxx> · Mon, 20 May 2019 14:50:53 -0600

Hi Rob,

On Wed, 8 May 2019 at 17:58, Rob Herring <robh@xxxxxxxxxx> wrote:
>
> On Mon, Apr 29, 2019 at 5:34 PM Simon Glass <sjg@xxxxxxxxxxxx> wrote:
> >
> > This is an attempt to illustrate an alternative method of validating
> > device-tree schema, based on pylibfdt and Python-based schema files.
> >
> > Two tools are included:
> >
> >    validate_dts.py  - validates a .dts file based on schema files it
> >                         finds in the kernel
> >    binding_to_py.py - create Python schema files from an individual
> >                         .txt binding file
> >
> > Both tools are just at the proof-of-concept stage. See the instructions
> > at the top of each for usage.
> >
> > This DT validator came out of work in Chrome OS and was proven successful
> > in providing a simple validator which is extremely flexible (e.g. custom
> > code is easily added to check phandle targets and other non-trivial rules)
> > and easy to use. It provides simple error messages when it finds a
> > problem, and works directly from the compiled .dtb file. It is also quite
> > fast even in the face of large schemas and large DT files.
> >
> > Schema is handled by Python files in Documentation/devicetree/binding,
> > mirroring the existing .txt files. The above tool makes an attempt to
> > convert .txt to .py, but it is very basic at present.
> >
> > To try this you also need the kernel patch:
> >
> >    RFC: Example schema files written in Python
>
> I also replied to this, so I won't repeat what I said there.
>
> > diff --git a/README.kernel_validate b/README.kernel_validate
> > new file mode 100644
> > index 0000000..9c0a871
> > --- /dev/null
> > +++ b/README.kernel_validate
> > @@ -0,0 +1,118 @@
> > +Kernel Device-Tree Validator based on Python and pylibfdt
> > +=========================================================
> > +
> > +Background
> > +----------
> > +
> > +As part of the discussions at Linux Plumbers 2018 [1] I showed a few people a
> > +device-tree validator which uses a Python-based schema. This was developed for
> > +Chrome OS to deal with the validation problem there.
>
> I did go look at it then. :)

Oh, funny.

>
> > +From my someone limited understanding of YAML, JSON and validation in that
> > +world [2] it seems to me that it was very difficult to get that technology to
> > +validate the kernel DT files successfully: lots of complex tooling, regexs and
> > +potentially a need to move to yaml for the source format.
>
> I don't agree with this part. We're already able to do lots of
> validation even though we have only converted over a handful of
> bindings (getting people to fix the issues is the problem).

I really like the dtc validation that is in place. It seems to work well.

>
> Complex tooling? It's 600 or so lines for the library and a couple of
> tools at 50-100 lines each. That's less than this patch.

This patch actually includes the validator, whereas I assume you are
relying on the yaml/json validator.

>
> regex's are very helpful and I would say required. For example, it's a
> 2 line addition to add type checking for all occurrences of
> "^.*-supply$".

Yes I agree, and that's a nice idea. See PropString() which provides
for a regex - it only works on values at present.

>
> We only need yaml as an intermediate format. That's already in place in dtc.

Yes.

>
> >  In addition it was not
> > +clear to me that it would be possible to do the sort of deep validation that is
> > +desirable with the kernel DT files, for example checking subnodes do not
> > +conflict, handling phandles which link nodes in both directions. Some of this
> > +has in fact been handled by mnodifying dtc, which seems like a better approach.
> > +But it has its limits, without deep knowledge of the schema.
>
> This I do agree with, but I don't see an example of this problem
> solved in your RFC (maybe I missed it).

I had an example which checks that an int array is monotonically
increasing and has no duplicates (e.g. for backlight
brightness-levels) but I did not include it.

> As I mentioned in the other
> patch, I think extending the dtc checks is the way forward.

I certainly like this, so far as it goes to date.

> We need to
> be able to extend the checks easily. A sort of checks plugin which
> supports at least C and python language bindings would be awesome. You
> probably know how to do that better than me.

I worry that we are conflating two goals, here:

1. Compiling the source code into a binary format, without access to
the schema ("compiler")
2. Validating the code against a schema ("validator")

To my mind, I could give you perfectly valid .dtb output which is does
not match a particular schema.

Are you suggesting that dtc should accept a directory full of schema
files, too? I'm just not quite sure I understand the direction here.

>
> > +
> > +So I put together this proof-of-concept to show another approach, to seek
> > +comments and see what people think.
> > +
> > +Bad things:
> > +- It's Python
> > +- It requires a new input format for the schema
> > +- There would be a lot of work needed to turn this into production code
> > +- Many others that I suspect you are about to point out
> > +
> > +Good things:
> > +- It is quite easy to write the schema in Python
> > +- The validation rules are easy to write and understand, even complex ones
> > +- Highly complex validation rules (e.g. covering multiple linked node) are
> > +    possible to write, offering a very powerful validation framework
> > +- It is fast and works on the .dtb file directly
>
> Fast is good, but I don't think working on dtb's is an advantage.
> Using the dts and maintaining the bracketing (<>) in the yaml output
> lets us do better type checking.

Yes, well I suppose you've seen my proposal on adding type checking.
Presumably you mean post-processed yaml (with the #defines resolved,
for example)? I can certainly see the benefits.

> I guess that is a work-around for not
> parsing #.*-cells at least for some of the cases, and it does mean we
> have to be stricter on the source format. The latter is not a bad
> thing though IMO.

OK. What are your thoughts about the other 'bad / good things' points?

Regards,
Simon