Re: Size growth?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote:
> On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote:
> > On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote:
> > > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote:
> > > > On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote:
> > > > > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote:
> > > > > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote:
> > > > > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote:
> > > > > > > > Hi Tom,
> > > > > > > > 
> > > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini@xxxxxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > Hey all,
> > > > > > > > >
> > > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc
> > > > > > > > > to cbca977ea121.  I'm seeing some rather worrying size increases
> > > > > > > > > however.  For example, on an aarch64 board such as leez-rk3399 we have:
> > > > > > > > >      leez-rk3399    : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696
> > > > > > > > >         u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696)
> > > > > > > > >           function                                   old     new   delta
> > > > > > > > >           do_fdt                                    3992    4308    +316
> > > > > > > > >           fdt_check_header                           276     492    +216
> > > > > > > > >           fdt_add_property_                          388     556    +168
> > > > > > > > >           fdt_ro_probe_                              128     252    +124
> > > > > > > > >           fdt_packblocks_                            176     296    +120
> > > > > > > > >           fdt_open_into                              392     512    +120
> > > > > > > > >           fdt_blocks_misordered_                      96     216    +120
> > > > > > > > >           fdt_get_mem_rsv                            108     220    +112
> > > > > > > > >           fdt_offset_ptr                             104     208    +104
> > > > > > > > >           fdt_get_string                             288     388    +100
> > > > > > > > >           fdt_splice_                                148     228     +80
> > > > > > > > >           fdt_valid                                  204     276     +72
> > > > > > > > >           fdt_getprop_by_offset                      184     256     +72
> > > > > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > > > > >           fdt_num_mem_rsv                             60     116     +56
> > > > > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > > > > >           fdt_rw_probe_                              108     156     +48
> > > > > > > > >           fdt_mem_rsv                                 60     108     +48
> > > > > > > > >           fdt_getprop_namelen                        100     148     +48
> > > > > > > > >           fdt_get_property_by_offset_                100     148     +48
> > > > > > > > >           fdt_get_name                               164     212     +48
> > > > > > > > >           efi_install_fdt                            964    1012     +48
> > > > > > > > >           boot_get_fdt                               888     936     +48
> > > > > > > > >           fdt_move                                    80     116     +36
> > > > > > > > >           reserve_fdt                                 72      96     +24
> > > > > > > > >           reloc_fdt                                   76     100     +24
> > > > > > > > >           image_setup_libfdt                         284     308     +24
> > > > > > > > >           fit_image_load                            1608    1632     +24
> > > > > > > > >           fit_image_get_data_and_size                172     196     +24
> > > > > > > > >           fit_get_end                                 16      40     +24
> > > > > > > > >           fdt_next_tag                               256     280     +24
> > > > > > > > >           fdt_header_size                             12      36     +24
> > > > > > > > >           fdt_get_property_namelen_                  212     236     +24
> > > > > > > > >           fdt_get_property_namelen                    44      68     +24
> > > > > > > > >           fdt_get_phandle                            120     144     +24
> > > > > > > > >           fdt_del_node                                96     120     +24
> > > > > > > > >           fdt_del_mem_rsv                            112     136     +24
> > > > > > > > >           fdt_add_subnode_namelen                    284     308     +24
> > > > > > > > >           fdt_add_mem_rsv                            124     148     +24
> > > > > > > > >           common_diskboot                            680     704     +24
> > > > > > > > >           fdt_next_subnode                            80      88      +8
> > > > > > > > >         spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116)
> > > > > > > > >           function                                   old     new   delta
> > > > > > > > >           fdt_get_mem_rsv                             52     236    +184
> > > > > > > > >           fdt_add_property_                          340     484    +144
> > > > > > > > >           fdt_get_name                                68     152     +84
> > > > > > > > >           fdt_splice_                                148     228     +80
> > > > > > > > >           fdt_num_mem_rsv                             64     128     +64
> > > > > > > > >           fdt_splice_mem_rsv_                         96     152     +56
> > > > > > > > >           fdt_offset_ptr                              52     104     +52
> > > > > > > > >           fdt_splice_struct_                          96     144     +48
> > > > > > > > >           fdt_shrink_to_minimum                      220     268     +48
> > > > > > > > >           fdt_get_property_by_offset_                 36      84     +48
> > > > > > > > >           fdt_ro_probe_                                -      36     +36
> > > > > > > > >           fdt_subnode_offset_namelen                 200     232     +32
> > > > > > > > >           spl_load_simple_fit                        848     872     +24
> > > > > > > > >           spl_fit_append_fdt                         196     220     +24
> > > > > > > > >           fdt_getprop_by_offset                       80     104     +24
> > > > > > > > >           fdt_get_string                              64      88     +24
> > > > > > > > >           fdt_get_property_namelen_                  200     224     +24
> > > > > > > > >           fdt_get_phandle                            120     144     +24
> > > > > > > > >           fdt_del_mem_rsv                            104     128     +24
> > > > > > > > >           fdt_check_header                            28      52     +24
> > > > > > > > >           fdt_add_subnode_namelen                    260     284     +24
> > > > > > > > >           fdt_add_mem_rsv                            112     136     +24
> > > > > > > > >           fdt_next_tag                               192     212     +20
> > > > > > > > >           fdt_path_offset_namelen                    240     256     +16
> > > > > > > > >           fdt_supernode_atdepth_offset               160     172     +12
> > > > > > > > >           fdt_node_offset_by_phandle                 120     132     +12
> > > > > > > > >           fdt_mem_rsv                                 20       -     -20
> > > > > > > > >           fdt_check_prop_offset_                      64      44     -20
> > > > > > > > >           fdt_check_node_offset_                      64      44     -20
> > > > > > > > >
> > > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to
> > > > > > > > > 0xff so there's maximum savings already being done there.  Does anyone
> > > > > > > > > have ideas on where / how to further tweak code size?  Thanks!
> > > > > > > > 
> > > > > > > > +David Gibson
> > > > > > > > 
> > > > > > > > I suspect there are more checks that need to be made conditional.
> > > > > > > 
> > > > > > > Seems likely.
> > > > > > 
> > > > > > OK.  Does that mean you're going to take a look?
> > > > > 
> > > > > No, sorry, my time for dtc is very limited.
> > > > 
> > > > OK, fair point.
> > > > 
> > > > > > > Though, as I've opined before, from what I understand the SPL is *so*
> > > > > > > restricted an environment, I'm not really convinced DTs are the right
> > > > > > > tool for the job there.
> > > > > > 
> > > > > > SPL is space constrained, which is why we mask out all of the safety
> > > > > > checks.  But we still generally have enough memory that this is fine.
> > > > > > This specific board has 256KiB for SPL, for example.  But I'm not just
> > > > > > concerned about 1KiB of growth when everything is masked off, I'm also
> > > > > > concerned about 2KiB of growth over a changelog that doesn't read like
> > > > > > it added a bunch of stuff that should cause everything to grow,
> > > > > > either.
> > > > > 
> > > > > Yeah, that's a good point.  It's certainly not obvious to me what
> > > > > would have caused the growth.  I think we'll need a revision by
> > > > > revision test to see where it was added.
> > > > 
> > > > So, with a bit of playing around, I've used buildman to give us that,
> > > > for the leez-rk3399 platform.  The full results are at
> > > > https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note
> > > > that build #6, "scripts/dtc: Update to upstream version
> > > > v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync
> > > > entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has).
> > > 
> > > Right.. I was meaning stepping through the dtc upstream revisions, not
> > > just the uboot revisions, so you get a fine grained idea of where the
> > > increase is coming from.
> > 
> > Yes, that's what that link is.  I imported every revision from point A
> > to point B, and built U-Boot for it, and checked the sizes, since we
> > have tooling for that.  It's possible the bloat-o-meter script in the
> > kernel could be used, but I'm not sure what the right target(s) in dtc
> > would be, to get any useful information out.
> 
> Ok.. ok.  I might need some guidance to interpret the information
> there on that link.

You can ignore everything before line 78.  If we look at lines 86-100
there is:
09: scripts/dtc: Update to upstream version v1.4.6-25-g04b5b4062ccd
   aarch64: (for 1/1 boards) all +4.0 spl/u-boot-spl:all +4.0 spl/u-boot-spl:text +4.0 text +4.0
            leez-rk3399    : all +4 spl/u-boot-spl:all +4 spl/u-boot-spl:text +4 text +4
               u-boot: add: 2/-1, grow: 0/-1 bytes: 224/-220 (4)
                 function                                   old     new   delta
                 fdt_ro_probe_                                -     116    +116
                 fdt_rw_probe_                                -     108    +108
                 fdt_rw_check_header_                       108       -    -108
                 fdt_check_header                           116       4    -112
               spl-u-boot-spl: add: 2/-1, grow: 0/-1 bytes: 224/-220 (4)
                 function                                   old     new   delta
                 fdt_ro_probe_                                -     116    +116
                 fdt_rw_probe_                                -     108    +108
                 fdt_rw_check_header_                       108       -    -108
                 fdt_check_header                           116       4    -112

Which means that with dtc commit 04b5b4062ccd, we got the above function
size (text) changes.  Overall it's a 4 byte growth for "libfdt: Clean up
header checking functions" which at the end of the day, yes, that's a
fine and reasonable change and nothing with my U-Boot hat on I'm
concerned about.

Where it gets complicated is that it's not until line 722 where we get
to the end of where Simon's ASSUME_... changes were fully merged.  So
it's a bit more detective work to see where for example fdt_get_mem_rsv
was grown, but then not shrunk back with the ASSUME flags.  A quick
search shows there's 5 dtc commits that change fdt_get_mem_rsv.  My hope
is that it would be quicker for you or someone else that works on dtc a
lot to mentally swap those commits in and see "Oh, I guess we could
ASSUME_... over in this hunk" or "Oh, maybe we can rewrite this change
to be a bit more compact" easier than I :)

> > > > But what does all of this _mean_ ?  I kinda think I have an answer now.
> > > > One of the things that sticks out is 6dcb8ba408ec adds a lot and
> > > > 11738cf01f15 reduces it just a little.
> > > 
> > > Ah, that's a tricky one.  If we don't handle unaligned accesses we
> > > instead get intermittent bug reports where it just crashes.
> > 
> > We really need to talk about that then.  There was a problem of people
> > turning off the sanity check for making sure the entire device tree was
> > aligned and then having everything crash.
> 
> Ok... I'm not really sure where you're going with that thought.

In my reading of the mailing list history of how this issue came up,
it was someone was booting a dragonboard or something, and they (or
rather, the board maintainer set by default) the flag to use the device
tree wherever it is in memory and NOT to relocate it to a properly
aligned address.  This in turn lead to the kernel getting an unaligned
device tree and everything crashing.  The "I know what I'm doing" flag
was set, violated the documented requirements for device trees need to
reside in memory and everything blew up.

After that it was noticed that there could be some internal
mis-alignment and if you tried those accesses on a CPU that doesn't
support doing those reads easily there could be problems, but that's not
a common at all case (as noted by it not having been seen in practice).

> > > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just
> > > break for either an unaligned dtb (unlikely) or if you attempt to load
> > > an unaligned value from a property (more likely, but don't add the
> > > flag if you're not sure you don't need it).
> > 
> > So long as it's abstracted in such a way that we don't grow the size of
> > everything again, yes, that is the right way forward I think.
> 
> All the ASSUME flags should be resolved at compile time (at least with
> normal optimization levels enabled in the compiler), so testing for
> those shouldn't increase size at all.  If they do, something is wrong.

I'm saying that how ever this new ASSUME flag is done, it needs to be
done in such a way the compiler really will be smart about it.  So
something like making a new function that does fdt64_ld() if we aren't
ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are
ASSUME_ALIGNED_ACCESS.

> > These
> > tests still introduce a big boot-time slowdown as well.
> 
> I'm not sure exactly what tests you mean.

Sorry, s/tests/changes/.  I asked Patrice to re-run the boot time tests
on a resync to exactly what's in upstream dtc and he still sees the very
noticeable slowdown.

-- 
Tom

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Device Tree]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux