Re: [RFC PATCH 1/2] Preserve datatype information when parsing dts

David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> · Fri, 6 Jul 2018 11:17:45 +1000

On Thu, Jul 05, 2018 at 08:43:49AM -0600, Rob Herring wrote:
> On Wed, Jul 4, 2018 at 9:26 PM David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Mon, Dec 18, 2017 at 11:20:38AM +0000, Grant Likely wrote:
> > > On Mon, Dec 18, 2017 at 4:34 AM, David Gibson
> > > <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > On Tue, Dec 12, 2017 at 12:48:10PM +0000, Grant Likely wrote:
> > > >> On Tue, Dec 12, 2017 at 6:14 AM, David Gibson
> > > >> <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > > >> > On Tue, Nov 28, 2017 at 11:57:09AM +0000, Grant Likely wrote:
> > > >> >> The current code throws away all the data type and grouping information
> > > >> >> when parsing the DTS source file, which makes it difficult to
> > > >> >> reconstruct the data format when emitting a format that can express data
> > > >> >> types (ie. dts and yaml). Use the marker list to mark the beginning and
> > > >> >> end of each integer array block (<> and []), the datatype contained in
> > > >> >> each (8, 16, 32 & 64 bit widths), and the start of each string.
> > > >> >>
> > > >> >> At the same time, factor out the heuristic code used to guess a property
> > > >> >> type at emit time. It is a pretty well defined standalone block that
> > > >> >> could be used elsewhere, for instance, when emitting YAML source. Factor
> > > >> >> it out into a separate function so that it can be reused, and also to
> > > >> >> simplify the write_propval() function.
> > > >> >>
> > > >> >> When emitting, group integer output back into the same groups as the
> > > >> >> original source and use the REF_PATH and REF_PHANDLE markers to emit the
> > > >> >> the node reference instead of a raw path or phandle.
> > > >> >>
> > > >> >> Signed-off-by: Grant Likely <grant.likely@xxxxxxx>
> > > >> >
> > > >> > I'm a bit dubious how well forcing the marker mechanism to do all this
> > > >> > stuff it was never intended for can work in the long term.  Still,
> > > >> > it's an interesting experiment.
> > > >>
> > > >> As long as the actual data is stored as flat buffer, the markers
> > > >> mechanism works quite well for this. I tried doing something entirely
> > > >> separate, and it turned out to be awful. Another alternative is to
> > > >> break up the flat buffer into a chain of data blocks with attached
> > > >> type information, but that is a very invasive change.
> > > >>
> > > >> This approach has the advantage of being robust on accepting both
> > > >> typed and anonymous data. If the markers are not there then the
> > > >> existing behaviour can be maintained, but otherwise it can emit a
> > > >> higher fidelity of source language.
> > > >
> > > > Hm, true.  The approach is growing on me.  I guess what I'm still
> > > > dubious about is how much this type annotation can get us to approach
> > > > the YAML model.  For example, YAML can distinguish between [ [1, 2],
> > > > [3, 4] ] and [1, 2, 3, 4] which isn't really feasible in dtc.
> > >
> > > To start with I'm constraining what is permissible in the YAML
> > > encoding. So, even though YAML can encode multiple nested lists, I'm
> > > not permitting that in this iteration. To take an example:
> > >
> > > in dts: reg = <0x1000 0x100> <0x4000 0x300>;
> > > In YAML I'm encoding as:   reg: [ [0x1000, 0x100], [0x4000, 0x300] ]
> > >
> > > in dts: compatible = "acme,uart9000", "ns16550"
> > > is in YAML: compatible: [ "acme,uart9000", "ns16550"]
> > >
> > > in dts: #size-cells = <1>;
> > > in YAML: "#size-cells": [ [ 1 ] ]
> > >
> > > in dts: uint16-prop = /bits/ 16 <31>;
> > > in YAML: uint16-prop: [ !uint16 [31] ]
> > >
> > > I'm not allowing anything outside that pattern. So, the following are
> > > all disallowed currently:
> > > reg: [0x1000, 0x100, 0x4000, 0x300] /* integers need to be in a list -
> > > maps to <...> in dts */
> > > compatible: "ns16550" /* not encoded into list */
> > > reg: [ [ [0x4, 0xffff0000], 0x80000], [ [0x4, 0xfffe0000], 0x40000] ]
> > > /* Triple nesting not allowed*/
> >
> > Sorry, I meant to make a comment on this months ago, but never got
> > around to it.  In terms of the immediate problem here, this seems like
> > a reasonable approach.  However, it kind of underscores the lingering
> > worries I have about using YAML as a DT encoding format.  In the
> > JSON/YAML world, 1 and [ [ 1 ] ] are different things, and using the
> > later as a way of encoding what's essentially a plain integer would be
> > pretty perverse.
> 
> For the schema, I've taken the approach of converting single values to
> arrays/matrices within the validation tools. Otherwise, we end up with
> a lot of boilerplate in schema docs.

Right, I'm not surprised you needed that.

> > So, I'm concerned that if we have YAML front-and-centre to the user,
> > it will be pretty misleading as to what is and isn't possible.
> 
> I would like to make the YAML encoding just an intermediate format to
> use for validation purposes.

Yeah, I think that's a safer approach.

> This would give us some flexibility
> versus having a fixed format that's set in stone. And we wouldn't need
> to figure out support such as includes, /directives/, etc. or support
> YAML input. So how do we support YAML in dtc and not make it
> front-and-centre?

Right, which I think amounts to not encouraging its use as an actual
source format.  A few things to consider:
   * Maybe use plain JSON instead of YAML as our intermediate
     validation format - it's intended more for machines than humans
   * Don't put much effort into pretty printing the JSON / YAML, as
     long as it's parseable by the validation chain

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Attachment:
signature.asc

Description: PGP signature