Re: [PATCH 1/4] pylibfdt: add Property.as_stringlist()

Luca Weiss <luca@xxxxxxxxx> · Thu, 20 Jan 2022 20:24:31 +0100

Hi,

On Dienstag, 18. Jänner 2022 11:08:22 CET David Gibson wrote:
> On Wed, Jan 05, 2022 at 04:48:31PM -0600, Rob Herring wrote:
> > On Sat, Dec 25, 2021 at 7:26 AM Luca Weiss <luca@xxxxxxxxx> wrote:
> > > Add a new method for decoding a string list property, useful for e.g.
> > > the "reg-names" property.
> > > 
> > > Also add a test for the new method.
> > > 
> > > Signed-off-by: Luca Weiss <luca@xxxxxxxxx>
> > > ---
> > > 
> > >  pylibfdt/libfdt.i       | 7 +++++++
> > >  tests/pylibfdt_tests.py | 8 ++++++++
> > >  2 files changed, 15 insertions(+)
> > > 
> > > diff --git a/pylibfdt/libfdt.i b/pylibfdt/libfdt.i
> > > index 9ccc57b..c81b504 100644
> > > --- a/pylibfdt/libfdt.i
> > > +++ b/pylibfdt/libfdt.i
> > > 
> > > @@ -724,6 +724,13 @@ class Property(bytearray):
> > >              raise ValueError('Property contains embedded nul
> > >              characters')
> > >          
> > >          return self[:-1].decode('utf-8')
> > > 
> > > +    def as_stringlist(self):
> > > +        """Unicode is supported by decoding from UTF-8"""
> > > +        if self[-1] != 0:
> > > +            raise ValueError('Property lacks nul termination')
> > > +        parts = self[:-1].split(b'\x00')
> > > +        return list(map(lambda x: x.decode('utf-8'), parts))
> > 
> > Doesn't this result in multiple decode() calls when a single one would
> > work:
> > 
> > return data[:-1].decode(encoding='ascii').split('\0')
> 
> Uh.. I guess?  I feel like the split-then-decode makes more logical
> sense, since it's splitting a bytestring, then decoding the pieces as
> utf-8 strings.  That makes sense to me given that raw properties are
> bytestrings and can included multiple different datatypes and
> encodings in general.
> 
> In this specific case, decode-then-split would be fine as well, since
> \u00000 works as a separator unambiguously, but it still seems
> conceptually muddier to me.

The reason I made it this way was mostly because I didn't know you could have 
null bytes present in str.decode, I just remember horrible UnicodeDecodeErrors 
on invalid input from other projects.
If wanted I can make a patch changing to just one str.decode call as yes, it's 
surely more efficient than doing multiple. But it's also not like there will be 
100 parts of this property that's being decoded in a performance critical 
application so I think it's also okay like that.

Let me know what you think.
Regards
Luca