Re: aliases node - valid char set?

David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> · Fri, 20 Apr 2018 16:16:38 +1000

On Thu, Apr 12, 2018 at 08:20:46AM -0600, David Brown wrote:
> On Thu, Apr 12, 2018 at 02:44:20PM +1000, David Gibson wrote:
> 
> > Back in the OF days there might have been more restrictions based on
> > special characters in the Forth environment, to prevent paths with
> > aliases being confused for something else.  Not sure.
> 
> Not sure how much IEE 1275 really matters these days, but it specifies
> node names as:
> 
>    driver-name@unit-address:device-arguments
> 
> with the driver name [a-zA-Z0-9,._+-]+ (the comma being a convention),

Right.  The dtc lexer definition of PROPNODECHAR was based on that
description in §3.2.1.1 of 1275.  And.. now that I come to look back
at it, assuming the same set of chars for property and node names
wasn't really correct.  The set was expanded to include a few other
things because they were present in existing device trees of the time,
despite what IEE1275 said.

> the address is "bus dependent", and the device arguments being all
> printable characters other than "/", ":", and "@".

Right, but we never use device arguments in flat trees, so they don't
matter.

> The "/" obviously being because it is the path separator.
> 
> Alias name is any sequence of printable characters, other than "/",
> "\", ":", "[", "]", and "@".

> Property names do not allow upper-case characters, or "/", "\", ":",
> "[", "]", and "@".

Hrm.  Which is a bit odd, since alias names should also be property
names and obey all the same restrictions they do (no uppercase).

dtc makes the restrictions for node, property and alias names
identical.

> It does specify a specific encoding of 8859-1, which is a bit annoying
> in this Unicode world.  Many bytes of UTF-8 would be considered
> "non-printable" in 8859-1.

Yeah, that's kinda crap.  I think that's an argument for - whatever
else - keeping these to 7-bit ASCII, so we don't have character set
issues.

> I think mainly the restricted characters would matter, for parsing
> reasons (although the above suggests that "{" and "}" would be allowed
> in an identifier, which, although allowed by FORTH, is not going to be
> parsed that way by DTC).
> 
> FORTH's rules were pretty simple, a word was a string of characters
> separated by a space.  There aren't really any restrictions on the
> names, although names that look like numbers supersede that number, so
> aren't really a good idea.

Ah, ok.

> The DTC lexer being quite different.

It did actually derive from the same place, but yes has diverged a bit
based partly on practicalities and partly on what was actually found
in the wild.

Without a compelling reason, I'm disinclined to widen the set of
allowed characters.  We can always widen, but if we do and it turns
out to be problematic, going back could be very painful.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Attachment:
signature.asc

Description: PGP signature