Re: [PATCH v2 0/2] Cleanup io.h

"Arnd Bergmann" <arnd@xxxxxxxx> · Fri, 21 Feb 2025 18:15:30 +0100

On Fri, Feb 21, 2025, at 17:50, Andy Shevchenko wrote:
> On Fri, Feb 21, 2025 at 11:15:47AM +0100, Arnd Bergmann wrote:
>> As you already found, removing an old indirect #include that is
>> no longer needed usually leads to some files breaking. The more
>> impactful your change is in terms of build speed, the more
>> things break! I think in this case, removing linux/err.h and
>> linux/bug.h made very little difference because they are very
>> small files in terms of what else they include.
>
> While this is all true, removing unneeded inclusions rarely can lead to the
> "extra work with a little gain". When there is a replacement to the low
> level ones, it's also an improvement in my opinion and won't be harmful in
> the future. But I agree, that the stuff is way too tangled already and requires
> an enormous work to untangle it, even if doing it structurally.

The problem I see with prematurely applying small improvements like this
one is that they always cause build regressions, at least if the change
is any good. If we can find some more impactful changes like this one,
we can group them together in a branch and test them a lot better before
they even reach linux-next.

I mainly want to avoid people getting angry at Raag for repeatedly
breaking their subsystems by pushing small patches one at a time.

> Do you have your scripts for the showed statistics being published somewhere?

I had a good set of scripts on an older machine and might still
have some backups of that somewhere, but just hacked up something
ad-hoc today beased on what I remembered from that time. Here
are the snippets that you might find useful.

A patch to Kbuild to create a list of each included header for each
object file built in a given configuration (similar to the .filename.o.d
files, but in a format I found more convenient):

--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -307,7 +307,8 @@ cmd_ld_single = $(if $(objtool-enabled)$(is-single-obj-m), ; $(LD) $(ld_flags) -
 endif
 
 quiet_cmd_cc_o_c = CC $(quiet_modtag)  $@
-      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< \
+      cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $< ; \
+                   $(CC) $(c_flags) -E -o - $< | grep ^\#.*include | cut -f 2 -d\" | sort -u > $@.includes \
                $(cmd_ld_single) \
                $(cmd_objtool)
 
shell oneliner to find the header files that are most commonly included
from those files:

$ find -name \*includes | xargs cat | sort | uniq -c | sed -e 's:\./\|\././::g' | sort -rn | head -n 1000 > mostincluded

oneliner to preprocess each of those headers 

$ cat mostincluded | grep include/linux | while read a i ; do gcc -E $i -o ${i%.h}.i ${GCCARGS} ; done

oneliner to sort by product of includes and lines:

$ cat mostincluded | grep include/linux/ | while read a b ; do if [ -e ${b%.h}.i ] ; then echo $a `wc -l ${b%.h}.i` ; fi ; done | sort -n -k2 | while read a b c ; do echo $[$a * $b] $a $b $c ; done | sort -nr > fulllist

In the old days, I had cleaner versions of these in an automated script,
and produced a .dot file to visualize the dependencies with graphviz.
I did get to the point of more than doubling compile speed, so there was
a clear incentive to continue. In fact, the further I got along the way,
the better the savings. In the end I gave up when I could not
get to a useful subset to upstream first that wouldn't already break
hundreds of drivers.

The best idea I have to avoid that is to pick one header to clean up
from my list and do all the prerequisites but not actually break anything
at first.

      Arnd