On Fri, Oct 23, 2020 at 01:58:04AM +1100, David Gibson wrote: > On Thu, Oct 22, 2020 at 08:32:54AM -0400, Tom Rini wrote: > > On Thu, Oct 22, 2020 at 03:00:13PM +1100, David Gibson wrote: > > > On Wed, Oct 21, 2020 at 06:49:14PM -0400, Tom Rini wrote: > > > > On Tue, Oct 20, 2020 at 01:09:07PM +1100, David Gibson wrote: > > > > > On Mon, Oct 19, 2020 at 08:37:00AM -0400, Tom Rini wrote: > > > > > > On Mon, Oct 19, 2020 at 12:42:13PM +1100, David Gibson wrote: > > > > > > > On Fri, Oct 16, 2020 at 03:46:41PM -0600, Simon Glass wrote: > > > > > > > > Hi Tom, > > > > > > > > > > > > > > > > On Fri, 16 Oct 2020 at 13:36, Tom Rini <trini@xxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > Hey all, > > > > > > > > > > > > > > > > > > With my U-Boot hat on, I'm trying to move us up from 84e414b0b5bc in dtc > > > > > > > > > to cbca977ea121. I'm seeing some rather worrying size increases > > > > > > > > > however. For example, on an aarch64 board such as leez-rk3399 we have: > > > > > > > > > leez-rk3399 : all +2696 spl/u-boot-spl:all +1116 spl/u-boot-spl:text +1116 text +2696 > > > > > > > > > u-boot: add: 0/0, grow: 42/0 bytes: 2696/0 (2696) > > > > > > > > > function old new delta > > > > > > > > > do_fdt 3992 4308 +316 > > > > > > > > > fdt_check_header 276 492 +216 > > > > > > > > > fdt_add_property_ 388 556 +168 > > > > > > > > > fdt_ro_probe_ 128 252 +124 > > > > > > > > > fdt_packblocks_ 176 296 +120 > > > > > > > > > fdt_open_into 392 512 +120 > > > > > > > > > fdt_blocks_misordered_ 96 216 +120 > > > > > > > > > fdt_get_mem_rsv 108 220 +112 > > > > > > > > > fdt_offset_ptr 104 208 +104 > > > > > > > > > fdt_get_string 288 388 +100 > > > > > > > > > fdt_splice_ 148 228 +80 > > > > > > > > > fdt_valid 204 276 +72 > > > > > > > > > fdt_getprop_by_offset 184 256 +72 > > > > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > > > > fdt_num_mem_rsv 60 116 +56 > > > > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > > > > fdt_rw_probe_ 108 156 +48 > > > > > > > > > fdt_mem_rsv 60 108 +48 > > > > > > > > > fdt_getprop_namelen 100 148 +48 > > > > > > > > > fdt_get_property_by_offset_ 100 148 +48 > > > > > > > > > fdt_get_name 164 212 +48 > > > > > > > > > efi_install_fdt 964 1012 +48 > > > > > > > > > boot_get_fdt 888 936 +48 > > > > > > > > > fdt_move 80 116 +36 > > > > > > > > > reserve_fdt 72 96 +24 > > > > > > > > > reloc_fdt 76 100 +24 > > > > > > > > > image_setup_libfdt 284 308 +24 > > > > > > > > > fit_image_load 1608 1632 +24 > > > > > > > > > fit_image_get_data_and_size 172 196 +24 > > > > > > > > > fit_get_end 16 40 +24 > > > > > > > > > fdt_next_tag 256 280 +24 > > > > > > > > > fdt_header_size 12 36 +24 > > > > > > > > > fdt_get_property_namelen_ 212 236 +24 > > > > > > > > > fdt_get_property_namelen 44 68 +24 > > > > > > > > > fdt_get_phandle 120 144 +24 > > > > > > > > > fdt_del_node 96 120 +24 > > > > > > > > > fdt_del_mem_rsv 112 136 +24 > > > > > > > > > fdt_add_subnode_namelen 284 308 +24 > > > > > > > > > fdt_add_mem_rsv 124 148 +24 > > > > > > > > > common_diskboot 680 704 +24 > > > > > > > > > fdt_next_subnode 80 88 +8 > > > > > > > > > spl-u-boot-spl: add: 1/-1, grow: 25/-2 bytes: 1176/-60 (1116) > > > > > > > > > function old new delta > > > > > > > > > fdt_get_mem_rsv 52 236 +184 > > > > > > > > > fdt_add_property_ 340 484 +144 > > > > > > > > > fdt_get_name 68 152 +84 > > > > > > > > > fdt_splice_ 148 228 +80 > > > > > > > > > fdt_num_mem_rsv 64 128 +64 > > > > > > > > > fdt_splice_mem_rsv_ 96 152 +56 > > > > > > > > > fdt_offset_ptr 52 104 +52 > > > > > > > > > fdt_splice_struct_ 96 144 +48 > > > > > > > > > fdt_shrink_to_minimum 220 268 +48 > > > > > > > > > fdt_get_property_by_offset_ 36 84 +48 > > > > > > > > > fdt_ro_probe_ - 36 +36 > > > > > > > > > fdt_subnode_offset_namelen 200 232 +32 > > > > > > > > > spl_load_simple_fit 848 872 +24 > > > > > > > > > spl_fit_append_fdt 196 220 +24 > > > > > > > > > fdt_getprop_by_offset 80 104 +24 > > > > > > > > > fdt_get_string 64 88 +24 > > > > > > > > > fdt_get_property_namelen_ 200 224 +24 > > > > > > > > > fdt_get_phandle 120 144 +24 > > > > > > > > > fdt_del_mem_rsv 104 128 +24 > > > > > > > > > fdt_check_header 28 52 +24 > > > > > > > > > fdt_add_subnode_namelen 260 284 +24 > > > > > > > > > fdt_add_mem_rsv 112 136 +24 > > > > > > > > > fdt_next_tag 192 212 +20 > > > > > > > > > fdt_path_offset_namelen 240 256 +16 > > > > > > > > > fdt_supernode_atdepth_offset 160 172 +12 > > > > > > > > > fdt_node_offset_by_phandle 120 132 +12 > > > > > > > > > fdt_mem_rsv 20 - -20 > > > > > > > > > fdt_check_prop_offset_ 64 44 -20 > > > > > > > > > fdt_check_node_offset_ 64 44 -20 > > > > > > > > > > > > > > > > > > And note that for the SPL case we're already setting ASSUME_MASK to > > > > > > > > > 0xff so there's maximum savings already being done there. Does anyone > > > > > > > > > have ideas on where / how to further tweak code size? Thanks! > > > > > > > > > > > > > > > > +David Gibson > > > > > > > > > > > > > > > > I suspect there are more checks that need to be made conditional. > > > > > > > > > > > > > > Seems likely. > > > > > > > > > > > > OK. Does that mean you're going to take a look? > > > > > > > > > > No, sorry, my time for dtc is very limited. > > > > > > > > OK, fair point. > > > > > > > > > > > Though, as I've opined before, from what I understand the SPL is *so* > > > > > > > restricted an environment, I'm not really convinced DTs are the right > > > > > > > tool for the job there. > > > > > > > > > > > > SPL is space constrained, which is why we mask out all of the safety > > > > > > checks. But we still generally have enough memory that this is fine. > > > > > > This specific board has 256KiB for SPL, for example. But I'm not just > > > > > > concerned about 1KiB of growth when everything is masked off, I'm also > > > > > > concerned about 2KiB of growth over a changelog that doesn't read like > > > > > > it added a bunch of stuff that should cause everything to grow, > > > > > > either. > > > > > > > > > > Yeah, that's a good point. It's certainly not obvious to me what > > > > > would have caused the growth. I think we'll need a revision by > > > > > revision test to see where it was added. > > > > > > > > So, with a bit of playing around, I've used buildman to give us that, > > > > for the leez-rk3399 platform. The full results are at > > > > https://gist.github.com/trini/e42cc848d38bfc77e0124d3ace86d95d and note > > > > that build #6, "scripts/dtc: Update to upstream version > > > > v1.4.6-22-g24b1f3f064d4" is the first interesting one (#5 was a resync > > > > entirely to dtc's 84e414b0b5bc rather than strictly what U-Boot has). > > > > > > Right.. I was meaning stepping through the dtc upstream revisions, not > > > just the uboot revisions, so you get a fine grained idea of where the > > > increase is coming from. > > > > Yes, that's what that link is. I imported every revision from point A > > to point B, and built U-Boot for it, and checked the sizes, since we > > have tooling for that. It's possible the bloat-o-meter script in the > > kernel could be used, but I'm not sure what the right target(s) in dtc > > would be, to get any useful information out. > > Ok.. ok. I might need some guidance to interpret the information > there on that link. You can ignore everything before line 78. If we look at lines 86-100 there is: 09: scripts/dtc: Update to upstream version v1.4.6-25-g04b5b4062ccd aarch64: (for 1/1 boards) all +4.0 spl/u-boot-spl:all +4.0 spl/u-boot-spl:text +4.0 text +4.0 leez-rk3399 : all +4 spl/u-boot-spl:all +4 spl/u-boot-spl:text +4 text +4 u-boot: add: 2/-1, grow: 0/-1 bytes: 224/-220 (4) function old new delta fdt_ro_probe_ - 116 +116 fdt_rw_probe_ - 108 +108 fdt_rw_check_header_ 108 - -108 fdt_check_header 116 4 -112 spl-u-boot-spl: add: 2/-1, grow: 0/-1 bytes: 224/-220 (4) function old new delta fdt_ro_probe_ - 116 +116 fdt_rw_probe_ - 108 +108 fdt_rw_check_header_ 108 - -108 fdt_check_header 116 4 -112 Which means that with dtc commit 04b5b4062ccd, we got the above function size (text) changes. Overall it's a 4 byte growth for "libfdt: Clean up header checking functions" which at the end of the day, yes, that's a fine and reasonable change and nothing with my U-Boot hat on I'm concerned about. Where it gets complicated is that it's not until line 722 where we get to the end of where Simon's ASSUME_... changes were fully merged. So it's a bit more detective work to see where for example fdt_get_mem_rsv was grown, but then not shrunk back with the ASSUME flags. A quick search shows there's 5 dtc commits that change fdt_get_mem_rsv. My hope is that it would be quicker for you or someone else that works on dtc a lot to mentally swap those commits in and see "Oh, I guess we could ASSUME_... over in this hunk" or "Oh, maybe we can rewrite this change to be a bit more compact" easier than I :) > > > > But what does all of this _mean_ ? I kinda think I have an answer now. > > > > One of the things that sticks out is 6dcb8ba408ec adds a lot and > > > > 11738cf01f15 reduces it just a little. > > > > > > Ah, that's a tricky one. If we don't handle unaligned accesses we > > > instead get intermittent bug reports where it just crashes. > > > > We really need to talk about that then. There was a problem of people > > turning off the sanity check for making sure the entire device tree was > > aligned and then having everything crash. > > Ok... I'm not really sure where you're going with that thought. In my reading of the mailing list history of how this issue came up, it was someone was booting a dragonboard or something, and they (or rather, the board maintainer set by default) the flag to use the device tree wherever it is in memory and NOT to relocate it to a properly aligned address. This in turn lead to the kernel getting an unaligned device tree and everything crashing. The "I know what I'm doing" flag was set, violated the documented requirements for device trees need to reside in memory and everything blew up. After that it was noticed that there could be some internal mis-alignment and if you tried those accesses on a CPU that doesn't support doing those reads easily there could be problems, but that's not a common at all case (as noted by it not having been seen in practice). > > > I suppose we could add an ASSUME_ALIGNED_ACCESS flag, and it will just > > > break for either an unaligned dtb (unlikely) or if you attempt to load > > > an unaligned value from a property (more likely, but don't add the > > > flag if you're not sure you don't need it). > > > > So long as it's abstracted in such a way that we don't grow the size of > > everything again, yes, that is the right way forward I think. > > All the ASSUME flags should be resolved at compile time (at least with > normal optimization levels enabled in the compiler), so testing for > those shouldn't increase size at all. If they do, something is wrong. I'm saying that how ever this new ASSUME flag is done, it needs to be done in such a way the compiler really will be smart about it. So something like making a new function that does fdt64_ld() if we aren't ASSUME_ALIGNED_ACCESS and fdt64_to_cpu() if we are ASSUME_ALIGNED_ACCESS. > > These > > tests still introduce a big boot-time slowdown as well. > > I'm not sure exactly what tests you mean. Sorry, s/tests/changes/. I asked Patrice to re-run the boot time tests on a resync to exactly what's in upstream dtc and he still sees the very noticeable slowdown. -- Tom
Attachment:
signature.asc
Description: PGP signature