Re: Build performance regressions originating from min()/max() macros

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 23, 2024 at 10:59:15PM GMT, Lorenzo Stoakes wrote:
> Arnd reported a significant build slowdown [0], which was bisected to the
> series spanning commit 80fcac55385c ("minmax: relax check to allow
> comparison between unsigned arguments and signed constants") to commit
> 867046cc70277 ("minmax: relax check to allow comparison between unsigned
> arguments and signed constants"), originating from the series "minmax:
> Relax type checks in min() and max()." [1].
>
> I have reproduced this locally, reverting this series and manually fixing
> up all call sites that invoke min()/max() for a simple x86-64 defconfig (+
> some other debug flags I use for debug kernels, I can provide the .config
> if needed).
>
> Arnd noted that the arch/x86/xen/setup.c file was particularly problematic,
> taking 15 (!) seconds to pre-process on his machine, so I also enabled
> CONFIG_XEN to test this and obtained performance numbers with this set/not
> set.
>
> I was able to reproduce this very significant pre-processor time on this
> file, noting that with the series reverted compile time for the file is
> 0.79s, with it in place, it takes 6.90s for a 873.4% slowdown.
>
> I also checked total build times (32-core intel i9-14900KF box):
>
> ## With CONFIG_XEN
>
> ### Reverted minmax code
>
> make 1588.46s user 92.33s system 2430% cpu 1:09.16 total
> make 1598.57s user 93.49s system 2419% cpu 1:09.94 total
> make 1598.99s user 92.49s system 2419% cpu 1:09.91 total
>
> ### Not reverted
>
> make 1639.25s user 96.34s system 2433% cpu 1:11.32 total
> make 1640.34s user 96.01s system 2427% cpu 1:11.54 total
> make 1639.98s user 96.76s system 2436% cpu 1:11.27 total
>
> ## Without CONFIG_XEN
>
> ### Reverted minmax code
>
> make 1524.97s user 89.84s system 2399% cpu 1:07.31 total
> make 1521.01s user 88.99s system 2391% cpu 1:07.32 total
> make 1530.75s user 89.65s system 2389% cpu 1:07.83 total
>
> ### Not reverted
>
> make 1570.64s user 94.09s system 2398% cpu 1:09.41 total
> make 1571.25s user 94.36s system 2401% cpu 1:09.36 total
> make 1568.25s user 93.83s system 2396% cpu 1:09.35 total
>
> Which suggests a worryingly significant slowdown of ~45s with CONFIG_XEN
> enabled and ~35s even without it.
>
> The underlying problems seems to be very large macro expansions, which Arnd
> noted in the xen case originated from the line:
>
> extra_pages = min3(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
> 		extra_pages, max_pages - max_pfn);
>
> And resulted in the generation of 47 MB (!) of pre-processor output.
>
> It seems a lot of code now relies on the relaxed conditions of the newly
> changed min/max() macros, so the question is - what can we do to address
> these regressions?
>
> [0]:https://social.kernel.org/notice/AkDuGHsn0WuA1g1uD2
> [1]:https://lore.kernel.org/all/b97faef60ad24922b530241c5d7c933c@xxxxxxxxxxxxxxxx/

It seems that (again, all credit to Arnd for his thorough analysis here) a
lot of the underyling issue revolves around the macros need to function
both in scenarios where we absolutely must have a constant value (for
instance, array size) as well as in scenarios where this is not the case.

Arnd also discovered [0] there _relatively_ few call sites that require
this, so maybe a way forward might be to create specific min()/max() macros
for the strictly const case and to fix up the core ones to reduce expansion
at the cost of not being able to use these in these scenarios?

Does this seem viable? Perhaps David you might have thoughts on this?

[0]:https://social.kernel.org/notice/AkFSGxnKlr6Fc5vpuS




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux