Re: [RFC PATCH 1/2] Preserve datatype information when parsing dts

Rob Herring <robh@xxxxxxxxxx> · Mon, 9 Jul 2018 16:43:27 -0600

On Fri, Jul 6, 2018 at 11:21 AM Rob Herring <robh@xxxxxxxxxx> wrote:
>
> On Thu, Jul 5, 2018 at 7:18 PM David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Jul 05, 2018 at 08:43:49AM -0600, Rob Herring wrote:
> > > On Wed, Jul 4, 2018 at 9:26 PM David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Dec 18, 2017 at 11:20:38AM +0000, Grant Likely wrote:
> > > > > On Mon, Dec 18, 2017 at 4:34 AM, David Gibson
> > > > > <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > > On Tue, Dec 12, 2017 at 12:48:10PM +0000, Grant Likely wrote:
> > > > > >> On Tue, Dec 12, 2017 at 6:14 AM, David Gibson
> > > > > >> <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > >> > On Tue, Nov 28, 2017 at 11:57:09AM +0000, Grant Likely wrote:
> > > > > >> >> The current code throws away all the data type and grouping information
> > > > > >> >> when parsing the DTS source file, which makes it difficult to
> > > > > >> >> reconstruct the data format when emitting a format that can express data
> > > > > >> >> types (ie. dts and yaml). Use the marker list to mark the beginning and
> > > > > >> >> end of each integer array block (<> and []), the datatype contained in
> > > > > >> >> each (8, 16, 32 & 64 bit widths), and the start of each string.
> > > > > >> >>
> > > > > >> >> At the same time, factor out the heuristic code used to guess a property
> > > > > >> >> type at emit time. It is a pretty well defined standalone block that
> > > > > >> >> could be used elsewhere, for instance, when emitting YAML source. Factor
> > > > > >> >> it out into a separate function so that it can be reused, and also to
> > > > > >> >> simplify the write_propval() function.
> > > > > >> >>
> > > > > >> >> When emitting, group integer output back into the same groups as the
> > > > > >> >> original source and use the REF_PATH and REF_PHANDLE markers to emit the
> > > > > >> >> the node reference instead of a raw path or phandle.
> > > > > >> >>
> > > > > >> >> Signed-off-by: Grant Likely <grant.likely@xxxxxxx>
> > > > > >> >
> > > > > >> > I'm a bit dubious how well forcing the marker mechanism to do all this
> > > > > >> > stuff it was never intended for can work in the long term.  Still,
> > > > > >> > it's an interesting experiment.
> > > > > >>
> > > > > >> As long as the actual data is stored as flat buffer, the markers
> > > > > >> mechanism works quite well for this. I tried doing something entirely
> > > > > >> separate, and it turned out to be awful. Another alternative is to
> > > > > >> break up the flat buffer into a chain of data blocks with attached
> > > > > >> type information, but that is a very invasive change.
> > > > > >>
> > > > > >> This approach has the advantage of being robust on accepting both
> > > > > >> typed and anonymous data. If the markers are not there then the
> > > > > >> existing behaviour can be maintained, but otherwise it can emit a
> > > > > >> higher fidelity of source language.
> > > > > >
> > > > > > Hm, true.  The approach is growing on me.  I guess what I'm still
> > > > > > dubious about is how much this type annotation can get us to approach
> > > > > > the YAML model.  For example, YAML can distinguish between [ [1, 2],
> > > > > > [3, 4] ] and [1, 2, 3, 4] which isn't really feasible in dtc.
> > > > >
> > > > > To start with I'm constraining what is permissible in the YAML
> > > > > encoding. So, even though YAML can encode multiple nested lists, I'm
> > > > > not permitting that in this iteration. To take an example:
> > > > >
> > > > > in dts: reg = <0x1000 0x100> <0x4000 0x300>;
> > > > > In YAML I'm encoding as:   reg: [ [0x1000, 0x100], [0x4000, 0x300] ]
> > > > >
> > > > > in dts: compatible = "acme,uart9000", "ns16550"
> > > > > is in YAML: compatible: [ "acme,uart9000", "ns16550"]
> > > > >
> > > > > in dts: #size-cells = <1>;
> > > > > in YAML: "#size-cells": [ [ 1 ] ]
> > > > >
> > > > > in dts: uint16-prop = /bits/ 16 <31>;
> > > > > in YAML: uint16-prop: [ !uint16 [31] ]
> > > > >
> > > > > I'm not allowing anything outside that pattern. So, the following are
> > > > > all disallowed currently:
> > > > > reg: [0x1000, 0x100, 0x4000, 0x300] /* integers need to be in a list -
> > > > > maps to <...> in dts */
> > > > > compatible: "ns16550" /* not encoded into list */
> > > > > reg: [ [ [0x4, 0xffff0000], 0x80000], [ [0x4, 0xfffe0000], 0x40000] ]
> > > > > /* Triple nesting not allowed*/
> > > >
> > > > Sorry, I meant to make a comment on this months ago, but never got
> > > > around to it.  In terms of the immediate problem here, this seems like
> > > > a reasonable approach.  However, it kind of underscores the lingering
> > > > worries I have about using YAML as a DT encoding format.  In the
> > > > JSON/YAML world, 1 and [ [ 1 ] ] are different things, and using the
> > > > later as a way of encoding what's essentially a plain integer would be
> > > > pretty perverse.
> > >
> > > For the schema, I've taken the approach of converting single values to
> > > arrays/matrices within the validation tools. Otherwise, we end up with
> > > a lot of boilerplate in schema docs.
> >
> > Right, I'm not surprised you needed that.
> >
> > > > So, I'm concerned that if we have YAML front-and-centre to the user,
> > > > it will be pretty misleading as to what is and isn't possible.
> > >
> > > I would like to make the YAML encoding just an intermediate format to
> > > use for validation purposes.
> >
> > Yeah, I think that's a safer approach.
> >
> > > This would give us some flexibility
> > > versus having a fixed format that's set in stone. And we wouldn't need
> > > to figure out support such as includes, /directives/, etc. or support
> > > YAML input. So how do we support YAML in dtc and not make it
> > > front-and-centre?
> >
> > Right, which I think amounts to not encouraging its use as an actual
> > source format.
>
> Lack of input support in dtc will do that. I'm using it as a source
> format currently for test cases, but those could easily be converted
> to dts.
>
> > A few things to consider:
> >    * Maybe use plain JSON instead of YAML as our intermediate
> >      validation format - it's intended more for machines than humans
>
> It's an easy switch on the validation side since the python
> representations are the same. It will be more work switching dtc as C
> libraries don't have a common representation.
>
> If the output was completely hidden, then it really wouldn't matter.
> But for debugging purposes, a human may be looking at the output, so I
> think it would be better to use YAML everywhere than a mixture. It's
> not exactly hard to understand JSON, but it is potentially one more
> thing for people.
>
> I've thought some about using JSON already because I also output the
> processed schema docs to speed up the validation. I thought using JSON
> may have been a little faster, but I did some tests and didn't find
> any significant difference.
>
> >    * Don't put much effort into pretty printing the JSON / YAML, as
> >      long as it's parseable by the validation chain
>
> JSON can't have comments. So we wouldn't be able to add source
> annotations. Though, I don't think libyaml gives us a way to add
> comments anyways. So, I guess we're not putting any effort into
> prettying printing.

Another issue with using JSON is only YAML supports tags (!u8, !u16,
etc.). Though currently for validation that information is just
stripped and I'm not exactly sure how we'd use and check it. The
schema would need to specify the type and we'd need to somehow
maintain the size info in Python. Maybe Grant had some idea...

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree-spec" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html