Re: [RFC] Introducing yamldt, a yaml to dtb compiler

David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> · Thu, 3 Aug 2017 15:49:14 +1000

On Wed, Aug 02, 2017 at 11:04:14PM +0100, Grant Likely wrote:
> I'll randomly choose this point in the thread to jump in...
> 
> On Wed, Aug 2, 2017 at 4:09 PM, David Gibson
> <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, Jul 27, 2017 at 08:51:40PM -0400, Tom Rini wrote:
> >> If the common dts source file was in yaml, binding docs would be written
> >> so that we could use them as validation and hey, the above wouldn't ever
> >> have happened.  And I'm sure this is not the only example that's in-tree
> >> right now.  These kind of problems create an artificially high barrier
> >> to entry in a rather important area of the kernel (you can't trust the
> >> docs, you have to check around the code too, and of course the code
> >> might have moved since the docs were written).
> >
> > Yeah, problems like that suck.  But I don't see that going to YAML
> > helps avoid them.  It may have a number of neat things it can do, but
> > yaml won't magically give you a way to match against bindings.  You'd
> > still need to define a way of describing bindings (on top of yaml or
> > otherwise) and implement the matching of DTs against bindings.
> 
> I'm going to try and apply a few constraints. I'm using the following
> assumptions for my reply.
> 1) DTS files exist, will continue to exist, and new ones will be
> created for the foreseeable future.
> 2) DTB is the format that the kernel and U-Boot consume

Right.  Regardless of (1), (2) is absolutely the case.  Contrary to
the initial description, the proposal in this thread really seems to
be about completely reworking the device tree data model.  While in
isolation the JSON/yaml data model is, I think, superior to the dtb
one, attempting to change over now lies somewhere between hopelessly
ambitious and completely bonkers, IMO.

> 3) Therefore the DTS->DTB workflow is the important one. Anything that
> falls outside of that may be interesting, but it distracts from the
> immediate problem and I don't want to talk about it here.
> 
> For schema documentation and checking, I've been investigating how to
> use JSON Schema to enforce DT bindings. Specifically, I've been using
> the JSONSchema Python library which strictly speaking doesn't operate
> on JSON or YAML, but instead operates directly on Python data
> structures. If that data happens to be imported from a DTS or DTB, the
> JSON Schema engine doesn't care.

So, inspired by this thread, I've had a little bit of a look at some
of these json/python schema systems, and thought about how they'd
apply to dtb.  It certainly seems worthwhile to exploit those schema
systems if we can, since they seem pretty close to what's wanted at
least flavour-wise.  But I see some difficulties that don't have
obvious (to me) solutions.

The main one is that they're based around the thing checked knowing
its own types (at least in terms of basic scalar/sequence/map
structure).  I guess that's the motivation behind Pantelis yamldt
notion, but that doesn't address the problem of validating dtbs in the
absence of source.

In a dtb you just have bytestrings, which means your bottom level
types in a suitable schema need to know how to extract themselves from
a bytestream - and in the DT that often means getting an element
length from a different property or even a different node (#*-cells
etc.).  AFAICT the json schema languages I looked at didn't really
have a notion like that.

The other is that because we don't have explicit sequences, a schema
matching a sequence either needs to have a explicit number of entries
(either from another property or preceding the sequence), or it has to
be the last thing in the property's pattern (for basically the same
reason that C99 doesn't allow flexible array members anywhere except
the end of a structure).

Or to look at it in a more JSONSchema specific way, before you examine
the schema, you can't pull the info in the dtb into Python structures
any more specific than "bytestring".

Have I missed some features in JSONSchema that help with this, or do
you have a clever solution already?

> The work Pantelis has done here is important because it defines a
> specific data model for DT data. That data model must be defined
> before schema files can be written, otherwise they'll be testing for
> the wrong things. However, rather than defining a language specific
> data model (ie. Python), specifying it in YAML means it doesn't depend
> on any particular language.

Urgh.. except that dtb already defines a data model, and it's not the
same as the JSON/yaml data model.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Attachment:
signature.asc

Description: PGP signature