Re: [PATCH 1/4] pylibfdt: add Property.as_stringlist()

David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> · Fri, 21 Jan 2022 11:12:17 +1100

On Thu, Jan 20, 2022 at 08:24:31PM +0100, Luca Weiss wrote:
> Hi,
> 
> On Dienstag, 18. Jänner 2022 11:08:22 CET David Gibson wrote:
> > On Wed, Jan 05, 2022 at 04:48:31PM -0600, Rob Herring wrote:
> > > On Sat, Dec 25, 2021 at 7:26 AM Luca Weiss <luca@xxxxxxxxx> wrote:
> > > > Add a new method for decoding a string list property, useful for e.g.
> > > > the "reg-names" property.
> > > > 
> > > > Also add a test for the new method.
> > > > 
> > > > Signed-off-by: Luca Weiss <luca@xxxxxxxxx>
> > > > ---
> > > > 
> > > >  pylibfdt/libfdt.i       | 7 +++++++
> > > >  tests/pylibfdt_tests.py | 8 ++++++++
> > > >  2 files changed, 15 insertions(+)
> > > > 
> > > > diff --git a/pylibfdt/libfdt.i b/pylibfdt/libfdt.i
> > > > index 9ccc57b..c81b504 100644
> > > > --- a/pylibfdt/libfdt.i
> > > > +++ b/pylibfdt/libfdt.i
> > > > 
> > > > @@ -724,6 +724,13 @@ class Property(bytearray):
> > > >              raise ValueError('Property contains embedded nul
> > > >              characters')
> > > >          
> > > >          return self[:-1].decode('utf-8')
> > > > 
> > > > +    def as_stringlist(self):
> > > > +        """Unicode is supported by decoding from UTF-8"""
> > > > +        if self[-1] != 0:
> > > > +            raise ValueError('Property lacks nul termination')
> > > > +        parts = self[:-1].split(b'\x00')
> > > > +        return list(map(lambda x: x.decode('utf-8'), parts))
> > > 
> > > Doesn't this result in multiple decode() calls when a single one would
> > > work:
> > > 
> > > return data[:-1].decode(encoding='ascii').split('\0')
> > 
> > Uh.. I guess?  I feel like the split-then-decode makes more logical
> > sense, since it's splitting a bytestring, then decoding the pieces as
> > utf-8 strings.  That makes sense to me given that raw properties are
> > bytestrings and can included multiple different datatypes and
> > encodings in general.
> > 
> > In this specific case, decode-then-split would be fine as well, since
> > \u00000 works as a separator unambiguously, but it still seems
> > conceptually muddier to me.
> 
> The reason I made it this way was mostly because I didn't know you could have 
> null bytes present in str.decode, I just remember horrible UnicodeDecodeErrors 
> on invalid input from other projects.
> If wanted I can make a patch changing to just one str.decode call as yes, it's 
> surely more efficient than doing multiple. But it's also not like there will be 
> 100 parts of this property that's being decoded in a performance critical 
> application so I think it's also okay like that.

I agree, and I tend to prefer it the way it is, unless there's a
compelling reason to change.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Attachment:
signature.asc

Description: PGP signature