Re: [PATCH] coccicheck: process every source file at once

Jacob Keller <jacob.keller@xxxxxxxxx> · Tue, 2 Oct 2018 13:00:21 -0700

On Tue, Oct 2, 2018 at 12:55 PM Jeff King <peff@xxxxxxxx> wrote:
>
> On Tue, Oct 02, 2018 at 12:16:42PM -0700, Jacob Keller wrote:
>
> > make coccicheck is used in order to apply coccinelle semantic patches,
> > and see if any of the transformations found within contrib/coccinelle/
> > can be applied to the current code base.
> >
> > Pass every file to a single invocation of spatch, instead of running
> > spatch once per source file.
> >
> > This reduces the time required to run make coccicheck by a significant
> > amount of time:
> >
> > Prior timing of make coccicheck
> >   real    6m14.090s
> >   user    25m2.606s
> >   sys     1m22.919s
> >
> > New timing of make coccicheck
> >   real    1m36.580s
> >   user    7m55.933s
> >   sys     0m18.219s
>
> Yay! This is a nice result.
>
> It's also one of the things that Julia suggested in an earlier thread.
> One thing I wasn't quite sure about after digging into the various
> versions (1.0.4 on Debian stable/unstable, 1.0.6 in experimental, and
> pre-1.0.7 at the bleeding edge) was whether the old versions would be
> similarly helped (or work at all).
>
> I just replicated your results with 1.0.4.deb-3+b2 from Debian stable.
> It's possible there are older versions floating around, but for
> something developer-only like this, I think "in Debian stable" is a
> reasonable enough cutoff.
>

Good. I hadn't checked back too far, but I know support for multiple
files has existed since quite a while.

> > This is nearly a 4x decrease in the time required to run make
> > coccicheck. This is due to the overhead of restarting spatch for every
> > file. By processing all files at once, we can amortize this startup cost
> > across the total number of files, rather than paying it once per file.
>
> One minor side effect is that we lose the opportunity to parallelize
> quite as much. However, I think the reduction in total CPU makes it well
> worth that (and we still have 8 cocci files and growing, which should
> keep at least 8 cores busy).
>

I don't think we do any less than previously, because we are doing
each file in a for-loop, so those would all be serial anyways.

> I think recent versions of Coccinelle will actually parallelize
> internally, too, but my 1.0.4 doesn't seem to. That's probably what I
> was thinking of earlier (but this is still a win even without that).
>
> It looks like this also fixes a problem I ran into when doing the oideq
> work, which is that the patch for a header file would get shown multiple
> times (once for each file that includes it). So this is faster _and_
> more correct. Double-yay.
>

Woot :D

> > diff --git a/Makefile b/Makefile
> > index df1df9db78da..b9947f3f51ec 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -2715,10 +2715,8 @@ endif
> >  %.cocci.patch: %.cocci $(COCCI_SOURCES)
> >       @echo '    ' SPATCH $<; \
> >       ret=0; \
> > -     for f in $(COCCI_SOURCES); do \
> > -             $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
> > -                     { ret=$$?; break; }; \
> > -     done >$@+ 2>$@.log; \
> > +     ( $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) || \
> > +             { ret=$$?; }; ) >$@+ 2>$@.log; \
>
> This looks pretty straight-forward. I wondered if we could get rid of
> the "ret" handling, since we don't need to pass the error back out of
> the loop. But it's also used for the "show the log only on error" logic
> below:
>

We could probably get rid of it by doing the spatch without an ||, and
then just save $?.

> >       if test $$ret != 0; \
> >       then \
> >               cat $@.log; \
>
> The subshell could be a {}, though, I think, but it's not that big a
> deal either way.
>
> -Peff