Re: [PATCH 3/5] cocci: make "coccicheck" rule incremental

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 25 2022, SZEDER Gábor wrote:

Thanks for taking a look!

> On Thu, Aug 25, 2022 at 04:36:15PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> * Since we create a single "*.cocci.patch+" we don't know where to
>>   pick up where we left off. Note that (per [1]) we've had a race
>>   condition since 960154b9c17 (coccicheck: optionally batch spatch
>>   invocations, 2019-05-06) which might result in us producing corrupt
>>   patches to to the concurrent appending to "$@+" in the pre-image.
>> 
>>   That bug is now fixed.
>
> There is no bug, because there is no concurrent appending to "$@+".
> The message you mention seems to be irrelevant, as it talks about
> 'xargs -P', but the invocation in '%.cocci.patch' targets never used
> '-P'.

I think this is just confusing, I'll amend/rephrase.

And at this point I honestly can't remember if I'm conflating this with
an issue with my earlier proposed series here (I drafted this a while
ago), or a rather obscure "hidden" feature that I did use to speed up
coccicheck for myself for a while. Which is that you can do e.g.:

	make coccicheck SPATCH_BATCH_SIZE="8 -P 8"

Which will invoke xargs with the "-P 8" option in batches of 8 files.

That's never a thing that 960154b9c17 expected, and it only worked as an
accident, but it *would* work, and unless you got unlucky with the races
involved would generally speed up your coccicheck.

>> Which is why we'll not depend on $(FOUND_H_SOURCES) but the *.o file
>> corresponding to the *.c file, if it exists already. This means that
>> we can do:
>> 
>>     make all
>>     make coccicheck
>>     make -W column.h coccicheck
>> 
>> By depending on the *.o we piggy-back on
>> COMPUTE_HEADER_DEPENDENCIES. See c234e8a0ecf (Makefile: make the
>> "sparse" target non-.PHONY, 2021-09-23) for prior art of doing that
>> for the *.sp files. E.g.:
>> 
>>     make contrib/coccinelle/free.cocci.patch
>>     make -W column.h contrib/coccinelle/free.cocci.patch
>> 
>> Will take around 15 seconds for the second command on my 8 core box if
>> I didn't run "make" beforehand to create the *.o files. But around 2
>> seconds if I did and we have those "*.o" files.
>> 
>> Notes about the approach of piggy-backing on *.o for dependencies:
>> 
>> * It *is* a trade-off since we'll pay the extra cost of running the C
>>   compiler, but we're probably doing that anyway.
>
> This assumption doesn't hold, and I very much dislike the idea of
> depending on *.o files:

It's my fault for not calling this out more explicitly, but it
*optionally* depends on the *.o files, but if you don't have them
compiled already a "make coccicheck" will just use "spatch", and nothing
else.

See the CI run/output for this series:
https://github.com/avar/git/runs/8017916844?check_suite_focus=true

>   - Our static-analysis CI job doesn't build Git, now it will have to.

Aside from this series which doesn't change how it works, maintaining
this doesn't seem important to me. E.g. as I noted in [1] coccinelle
will happily run on code that doesn't even compile.

So running the compiler during "static-analysis" (or another step that
it would depend on, maybe "pedantic" or "sparse") seems like a good (but
separate change).

You don't want to wonder about odd coccinelle output, only to see it's
trying to make sense of C source that doesn't even compile.

>   - I don't have Coccinelle installed, because my distro doesn't ship
>     it, and though the previous release did ship it, it was outdated.
>     Instead I use Coccinelle's most recent version from a container
>     which doesn't contain any build tools apart from 'make' for 'make
>     coccicheck'.
>
>     With this patch series I can't use this containerized Coccinelle
>     at all, because even though I've already built git on the host,
>     the dependency on *.o files triggers a BUILD-OPTIONS check during
>     'make coccicheck', and due to the missing 'curl-config' the build
>     options do differ, triggering a rebuild, which in the absence of a
>     compiler fails.
>
>     And then the next 'make' on the host will have to rebuild
>     everything again...

There may be some odd interaction here, but it's unclear if you've
actually tried to do this with this series, because unless I've missed
some edge case this should all still work, per the above.

What *won't* work is avoiding potential re-compilation of *.o files if
you *have them already* when you run "make coccicheck". I'm not familiar
with this type of setup, are you saying you're running "make coccicheck"
on a working directory that already has *.o files, but you want it to
ignore the *.o?

That could easily be made optional, but I just assumed that nobody would
care. If you want that can you try this on top and see if it works for
you?:
	
	diff --git a/Makefile b/Makefile
	index 9410a587fc0..11d83c490b4 100644
	--- a/Makefile
	+++ b/Makefile
	@@ -3174,6 +3174,11 @@ TINY_FOUND_H_SOURCES += strbuf.h
	 	$(call mkdir_p_parent_template)
	 	$(QUIET_GEN) >$@
	 
	+SPATCH_USE_O_DEPENDENCIES = yes
	+ifeq ($(COMPUTE_HEADER_DEPENDENCIES),no)
	+SPATCH_USE_O_DEPENDENCIES =
	+endif
	+
	 define cocci-rule
	 
	 ## Rule for .build/$(1).patch/$(2); Params:
	@@ -3181,7 +3186,7 @@ define cocci-rule
	 # 2 = $(2)
	 COCCI_$(1:contrib/coccinelle/%.cocci=%) += .build/$(1).patch/$(2)
	 .build/$(1).patch/$(2): GIT-SPATCH-DEFINES
	-.build/$(1).patch/$(2): $(if $(wildcard $(3)),$(3),$(if $(filter $(USE_TINY_FOUND_H_SOURCES),$(3)),$(TINY_FOUND_H_SOURCES),.build/contrib/coccinelle/FOUND_H_SOURCES))
	+.build/$(1).patch/$(2): $(if $(and $(SPATCH_USE_O_DEPENDENCIES),$(wildcard $(3))),$(3),$(if $(filter $(USE_TINY_FOUND_H_SOURCES),$(3)),$(TINY_FOUND_H_SOURCES),.build/contrib/coccinelle/FOUND_H_SOURCES))
	 .build/$(1).patch/$(2): $(1)
	 .build/$(1).patch/$(2): .build/$(1).patch/% : %
	 	$$(call mkdir_p_parent_template)

I'd need to split up the already long line, but with *.o files compiled
and:
	
	$ time make -W column.h contrib/coccinelle/free.cocci.patch COMPUTE_HEADER_DEPENDENCIES=yes
	    CC wt-status.o
	    CC builtin/branch.o
	    CC builtin/clean.o
	    CC builtin/column.o
	    CC builtin/commit.o
	    CC builtin/tag.o
	    CC column.o
	    CC help.o
	    SPATCH .build/contrib/coccinelle/free.cocci.patch/builtin/commit.c
	    SPATCH .build/contrib/coccinelle/free.cocci.patch/builtin/tag.c
	    SPATCH .build/contrib/coccinelle/free.cocci.patch/builtin/column.c
	    SPATCH .build/contrib/coccinelle/free.cocci.patch/column.c
	    SPATCH .build/contrib/coccinelle/free.cocci.patch/wt-status.c
	    SPATCH .build/contrib/coccinelle/free.cocci.patch/builtin/branch.c
	    SPATCH .build/contrib/coccinelle/free.cocci.patch/builtin/clean.c
	    SPATCH .build/contrib/coccinelle/free.cocci.patch/help.c
	    SPATCH MERGE contrib/coccinelle/free.cocci.patch
	
	real    0m0.550s
	user    0m0.505s
	sys     0m0.096s

But if you set COMPUTE_HEADER_DEPENDENCIES=yes it'll take ~4s (on that
box), and well re-apply the free.cocci rule to all the *.c files (and
this is all with the caching mechanism in 5/5, real "spatch" will be
much slower).

If you already have git compiled (or partially compiled) the "happy
path" to avoiding work is almost definitely to re-compile that *.o
because the *.c changed, at which point we'll be able to see if we even
need to re-run any of the coccinelle rules. Usually we'll need to re-run
a far smaller set than the full set we operate on now.

>>  * We can take better advantage of parallelism, while making sure that
>>    we don't racily append to the contrib/coccinelle/swap.cocci.patch
>>    file from multiple workers.
>> 
>>    Before this change running "make coccicheck" would by default end
>>    up pegging just one CPU at the very end for a while, usually as
>>    we'd finish whichever *.cocci rule was the most expensive.
>> 
>>    This could be mitigated by combining "make -jN" with
>>    SPATCH_BATCH_SIZE, see 960154b9c17 (coccicheck: optionally batch
>>    spatch invocations, 2019-05-06). But doing so required careful
>>    juggling, as e.g. setting both to 4 would yield 16 workers.
>
> No, setting both to 4 does yield 4 workers.
>
> SPATCH_BATCH_SIZE has nothing to do with parallelism; it is merely the
> number of C source files that we pass to a single 'spatch' invocation,
> but for any given semantic patch it's still a sequential loop.

Thanks, will fix. I see I conflated SPATCH_BATCH_SIZE with spatch's
--jobs there (although from experimentation that seems to have pretty
limited parallelism).

1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@xxxxxxxxxxxxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux