On Thu, Oct 13, 2022 at 02:05:39PM -0500, Rob Herring wrote: > On Thu, Oct 13, 2022 at 11:36 AM Quentin Kaiser > <quentin.kaiser@xxxxxxxxxx> wrote: > > > > Hi, > > > > We identified a “regression” between dtc 1.6.0 and 1.6.1 introduced by commit 9d7888cbf19c2930992844e69a097dc71e5a7354. > > It's always good practice to Cc the author of the commit and quote the subject. > > > As part of a firmware extraction framework that we built and open sourced (https://unblob.org), we have integration test files making sure our changes do not modify the way we extract specific file types. > > We do so by extracting integration test files and identifying changes using diff. Within this extraction framework, we have a handler that convert DTBs to DTS using the dtc command line tool. > > > > The test suite was failing on one of our colleague’s machine but not on ours, and we identified that the only difference was the DTC version (1.5.0 vs 1.6.1). > > > > You can reproduce it with the iMX6 Sabrelite DTB file available at https://github.com/FAlinux-SoftwareinLife/silfa/blob/master/OS/kernel/imx6q-sabrelite.dtb using this command: > > > > dtc -I dtb -O dts imx6q-sabrelite.dtb 2>/dev/null | grep min-voltage > > > > Here’s a diff showing the difference between the two generated DTS (one with 1.5.0, the other with 1.6.1): > > > > diff /tmp/1.5.0.dts /tmp/1.6.1.dts > > 852c852 > > < regulator-min-microvolt = <0xc3500>; > > --- > > > regulator-min-microvolt = "\0\f5"; > > 859c859 > > < min-voltage = <0xc3500>; > > --- > > > min-voltage = "\0\f5"; > > .dts output encoding from a dtb is a guess at best and is not > reliable. Anything that looks like ascii is going to get output that > way. That said, while a \0 at the start is a valid string in dts, it's > probably not likely in practice and we should not decode the data as > ascii in that case. Right. Like any decompilation, we have to make guesses as to what the original intent was. In case it wasn't clear from Rob's comment, those two outputs describe the exact same content in the dtb: In big-endian 0xc3500 is bytes: 0x00 0x0c 0x35 0x00 The string "\0\f5" is characters: '\0' '\f' '5' '\0' which have byte values: 0x00 0x0c 0x35 0x00 I think the behaviour change was introduced by 9d7888cb "dtc: Consider one-character strings as strings". Previously we used to guess "not a string" if there were as many or more \0s as non \0 characters, now it's only if there are strictly more \0s than non \0 characters. The change was made so that the fairly common string "/" would be rendered as a string, rather than as bytes [2f 00]. But you'll note that your example hits the same condition. One can imagine tweaks to these heuristics that would get this case "right": e.g. require at least one non-\0 character before each \0, or don't consider \f and other escapable control characters as "string" characters in isstring(). I'd consider patches that made such a change, but ultimately these are just a guess at the dt author's intention, so they can never be infallible. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachment:
signature.asc
Description: PGP signature