On Fri, Mar 05 2021, Jeff King wrote: I sent out a v2 that should address your feedback in : https://lore.kernel.org/git/20210305170724.23859-1-avarab@xxxxxxxxx/ Just brief notes: > On Tue, Mar 02, 2021 at 09:51:03PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> What we were doing before was processing all our *.c files, and for >> each of those *.c files we'd recursively look around for includes and >> see if we needed to search/replace in those too. >> >> That we did that dates back to [1] when we were only processing *.c >> files, and it was always very redundant. We'd e.g. visit the likes of >> strbuf.h lots of times since it's widely used as an include. > > OK. I don't offhand know if processing the includes along with the C > files buys us anything else. Coccinelle's behavior is often quite a > mystery, but if we think this produces the same results with less time, > I'm for it. > > BTW, this "dates back to when we were only processing *.c files" > statement confused me a bit. Doesn't that include the current state > before this patch? I.e., this hunk: > >> -FOUND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES))) >> +FOUND_C_SOURCES = $(filter %.c %.h,$(shell $(FIND_SOURCE_FILES))) > > seems to be an important part of the change, without which moving away > from --all-includes would be wrong. Updated in the new 2/4 commit message. >> Then in the most recent attempt to optimize coccicheck in [2] this >> anti-pattern finally turned into a bug. >> >> Namely: before this change, if your coccicheck rule applied to >> e.g. making a change in strbuf.h itself we'd get *lots* of duplicate >> hunks applying the exact same change, as concurrent spatch processes >> invoked by xargs raced one another. In one instance I ended up with 27 >> copies of the same hunk in a strbuf.patch. >> >> Setting SPATCH_BATCH_SIZE=0 and processing all the files in one giant >> batch mitigated this. I suspect the author of [2] either mostly ran in >> that mode, or didn't test on changes that impacted widely used header >> files. > > This "turned into a bug" I didn't understand, though. That commit [2] > just switched from always feeding files one at a time to letting you > feed more than one. So yes, feeding all at once (with > SPATCH_BATCH_SIZE=0) mitigated the bug. But wouldn't any bug that shows > up with SPATCH_BATCH_SIZE=1 have already existed previously? > > IOW, I don't think the batch-size stuff made anything worse. It made one > specific case better, but that was not even its purpose. > > That is maybe splitting hairs, but I want to make sure I understand all > of what you're claiming here. But also... Explained in the new 2/4. Your commit introduced a regression, because spatch does racy magic background locking, running N of them at a time applying the same rule tripped it up. I suspect it was racy before, but we didn't have overlapping rules in multiple *.cocci files. >> I'm entirely removing SPATCH_BATCH_SIZE. If you want to tweak it you >> can tweak SPATCH_XARGS_FLAGS to e.g. "-n 256", or "-P 4 -n 128". But >> in my testing it isn't worth it to tweak SPATCH_XARGS_FLAGS for a full >> "make coccicheck". > > You hard-code this to 32 now. But I'm not sure if that's going to be the > universal best value. > > Applying just part of your patch, but leaving SPATCH_BATCH_SIZE in > place, like so: > > diff --git a/Makefile b/Makefile > index dfb0f1000f..88cb157547 100644 > --- a/Makefile > +++ b/Makefile > @@ -1201,7 +1201,7 @@ SP_EXTRA_FLAGS = -Wno-universal-initializer > # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will > # usually result in less CPU usage at the cost of higher peak memory. > # Setting it to 0 will feed all files in a single spatch invocation. > -SPATCH_FLAGS = --all-includes --patch . > +SPATCH_FLAGS = --no-includes --patch . > SPATCH_BATCH_SIZE = 1 > > include config.mak.uname > @@ -2859,7 +2859,7 @@ check: config-list.h command-list.h > exit 1; \ > fi > > -FOUND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES))) > +FOUND_C_SOURCES = $(filter %.c %.h,$(shell $(FIND_SOURCE_FILES))) > COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FOUND_C_SOURCES)) > > %.cocci.patch: %.cocci $(COCCI_SOURCES) > > Then I get this with various batch sizes (using "make -j16" on an 8-core > machine, so all of the patches are being run at the same time): > > size | real | user | sys > ------------------------------- > 1 | 1m28s | 10m00s | 0m56s > 2 | 1m03s | 7m49s | 0m33s > 4 | 0m51s | 6m28s | 0m20s > 8 | 0m45s | 6m02s | 0m14s > 16 | 0m45s | 6m17s | 0m10s > 32 | 0m44s | 6m33s | 0m07s > 64 | 0m45s | 6m48s | 0m06s > 0 | 1m08s | 10m08s | 0m03s > > So there's a sweet spot at 8. Doing "32" isn't that much worse (runtime > is about the same, but we use more CPU), but it gets progressively worse > as the batch size increases. > > That's the same sweet spot before your patch, too, btw. I know because I > happened to be timing it the other day, as coccinelle 1.1.0 hit debian > unstable. When I originally wrote the SPATCH_BATCH_SIZE feature, I think > I was using 1.0.4 or 1.0.7. Back then SPATCH_BATCH_SIZE=0 was a clear > win, assuming you had the memory. But in 1.0.8 it ran for many minutes > without finishing. I found back then that "2" was the sweet spot. But > now it's "8". > > All of which is to say: the timing difference between my "8" and "32" > cases isn't that exciting. But the performance of spatch has been > sufficiently baffling to me that I think it's worth keeping the knob. > > Your XARGS_FLAGS does accomplish something similar (and is in fact more > flexible, though at the cost of being less abstract). I'm OK to replace > one with the other, but shouldn't it happen in a separate patch? It's > completely orthogonal to the --no-includes behavior. *nod*, set to a default of 8 in the new 4/4. >> I'm also the whole "cat $@.log" introduced in [3]. Since we don't call >> this in a loop anymore (and xargs will early-exit) we can just rely on >> standard V=1 for debugging issues. > > I think this is missing the point of the cat. We've redirected spatch's > stderr to the logfile. So if there's an error, you have no idea what it > was without manually figuring out which file to cat. And V=1 doesn't > help that. > > We could stop doing that redirection, but the problem is that spatch is > very verbose. So the point was to store the output but only dump it > when we see an error. > > So this part of the patch seems like a pure regression to me. E.g., > introduce a syntax error like: > > diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci > index 4490069df9..c6c6562a0a 100644 > --- a/contrib/coccinelle/free.cocci > +++ b/contrib/coccinelle/free.cocci > @@ -11,7 +11,7 @@ expression E; > free(E); > > @@ > -expression E; > +expression E > @@ > - free(E); > + FREE_AND_NULL(E); > > and run "make coccicheck" before and after your patch: > > [before] > $ make coccicheck > GIT_VERSION = 2.31.0.rc1.8.gbe7935ed8b.dirty > SPATCH contrib/coccinelle/free.cocci > init_defs_builtins: /usr/bin/../lib/coccinelle/standard.h > meta: parse error: > File "contrib/coccinelle/free.cocci", line 15, column 0, charpos = 99 > around = '@@', > whole content = @@ > > xargs: spatch: exited with status 255; aborting > make: *** [Makefile:2866: contrib/coccinelle/free.cocci.patch] Error 1 > > [after] > $ make coccicheck > SPATCH contrib/coccinelle/free.cocci > make: *** [Makefile:2875: contrib/coccinelle/free.cocci.patch] Error 124 The "cat $@.log" is back, but there's still some issues with it on master now (I didn't change this) it'll spew a lot at you with xargs since we emit the whole error output for every file, seemingly, before cat-ing the file: make -j1 coccicheck SPATCH_FLAGS=--blahblah V=1 2>&1|grep -i example.*use|wc -l 464 Well, it's a bit better now on my series with a default batch size of 8: $ make -j1 coccicheck SPATCH_FLAGS=--blahblah V=1 2>&1|grep -i example.*use|wc -l 88 I got tired of dealing with the combination of shellscritp and make for the day. But maybe something to do as a follow-up if anyone's interested.