Re: [PATCH 5/7] kbuild: get rid of duplication in *.mod files

Masahiro Yamada <masahiroy@xxxxxxxxxx> · Wed, 13 Apr 2022 17:19:40 +0900

On Sat, Apr 9, 2022 at 5:43 AM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
>
> On Thu, Apr 7, 2022 at 5:08 PM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
> >
> > On Fri, Apr 8, 2022 at 2:55 AM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Apr 6, 2022 at 8:31 AM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
> > > >
> > > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> > > > index 6ae92d119dfa..f7a30f378e20 100644
> > > > --- a/scripts/Makefile.build
> > > > +++ b/scripts/Makefile.build
> > > > @@ -303,7 +303,8 @@ $(obj)/%.prelink.o: $(obj)/%.o FORCE
> > > >         $(call if_changed,cc_prelink_modules)
> > > >  endif
> > > >
> > > > -cmd_mod = echo $(addprefix $(obj)/, $(call real-search, $*.o, .o, -objs -y -m)) > $@
> > > > +cmd_mod = echo $(addprefix $(obj)/, $(call real-search, $*.o, .o, -objs -y -m)) | \
> > > > +       $(AWK) -v RS='( |\n)' '!x[$$0]++' > $@
> > >
> > > God AWK is unreadable. Any reason we can't use GNU make's sort builtin?
> > > https://www.gnu.org/software/make/manual/html_node/Text-Functions.html
> >
> >
> > I did that in the previous submission.
> > https://lore.kernel.org/lkml/20220405113359.2880241-8-masahiroy@xxxxxxxxxx/
> >
> >
> > After some thoughts, I decided to drop duplicates without sorting.
> >
> > If I alphabetically sorted the object list,
> > 7/7 of this series would be impossible.
> >
> >
> > I am not a big fan of AWK, but I do not know a cleaner way.
> > If you know a better idea, please tell me.
>
> ```
> # stable_dedup.py
> from sys import argv
>
> wordset = set()
> argv.pop(0)
> for word in argv: wordset.add(word)
> for word in wordset: print(word)
> ```
> If that ever shows up in a profile of a kernel build, <set> in C++
> looks pretty similar.  Then that script can be reused in a couple of
> other places, and has a more descriptive name that hints at what it
> does.
>
> Compare that with `$(AWK) -v RS='( |\n)' '!x[$$0]++'`.

As I said, I want to drop duplicates without changing the argument order.

Your python code shuffles the order since it adds arguments to set() first.

    $ cat stable_dedup.py
    #!/usr/bin/python3
    from sys import argv
    wordset = set()
    argv.pop(0)
    for word in argv: wordset.add(word)
    for word in wordset: print(word)

    $ ./stable_dedup.py  c b a a b
    c
    a
    b

Here, the output I expect is "c b a".

If I am allowed to change the order, I would use
Make's $(sort ...) function or "sort -u" shell command.

Of course, it is pretty easy to write a python script
that dedups arguments without changing the order.

    $ cat dedup-by-python
    #!/usr/bin/python3
    import sys
    wordset = set()

    for x in sys.argv[1:]:
        if x not in wordset:
            print(x)
        wordset.add(x)

    $ ./dedup-by-python c b a a b
    c
    b
    a

Even this script looks like a bad approach.

Please note cmd_mod is invoked as many times
as the number of modules.
So, this happens many times, especially for allmodconfig.

Python takes a lot of overhead times for initialization.

AWK implementation is much faster.
It is apparent from perf.

[1] AWK implementation

    $ cat test-data.txt
    c b a a b

    $ cat dedup-by-awk
    #!/usr/bin/awk -f
    BEGIN { RS="( |\n)" }
    !x[$0]++ { print($0) }

    # perf stat  -- ./dedup-by-awk < test-data.txt
    c
    b
    a

 Performance counter stats for './dedup-by-awk':

              1.06 msec task-clock                #    0.790 CPUs
utilized
                 0      context-switches          #    0.000 /sec
                 0      cpu-migrations            #    0.000 /sec
               201      page-faults               #  189.755 K/sec
         3,671,995      cycles                    #    3.467 GHz
         3,932,770      instructions              #    1.07  insn per
cycle
           754,811      branches                  #  712.582 M/sec
            21,154      branch-misses             #    2.80% of all
branches
        18,350,660      slots                     #   17.324 G/sec
         4,173,875      topdown-retiring          #     22.7% retiring
         2,230,864      topdown-bad-spec          #     12.2% bad
speculation
         5,757,069      topdown-fe-bound          #     31.4% frontend
bound
         6,188,850      topdown-be-bound          #     33.7% backend
bound

       0.001341605 seconds time elapsed

       0.001476000 seconds user
       0.000000000 seconds sys

[2]  Python implementation

    # perf stat  -- ./dedup-by-python   c b a a b
   c
   b
   a

 Performance counter stats for './dedup-by-python c b a a b':

              9.34 msec task-clock                #    0.967 CPUs
utilized
                 0      context-switches          #    0.000 /sec
                 0      cpu-migrations            #    0.000 /sec
               756      page-faults               #   80.947 K/sec
        31,045,653      cycles                    #    3.324 GHz
        39,175,531      instructions              #    1.26  insn per
cycle
         8,488,886      branches                  #  908.929 M/sec
           326,947      branch-misses             #    3.85% of all
branches
       152,587,445      slots                     #   16.338 G/sec
        37,698,074      topdown-retiring          #     24.7% retiring
        32,911,017      topdown-bad-spec          #     21.6% bad
speculation
        55,051,156      topdown-fe-bound          #     36.1% frontend
bound
        26,927,196      topdown-be-bound          #     17.6% backend
bound

       0.009661105 seconds time elapsed

       0.006485000 seconds user
       0.003242000 seconds sys

> --
> Thanks,
> ~Nick Desaulniers

-- 
Best Regards
Masahiro Yamada