Re: [PATCH v2 7/9] cocci: make "coccicheck" rule incremental

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 01 Sep 2022 20:04:45 +0200

On Thu, Sep 01 2022, SZEDER Gábor wrote:

> On Wed, Aug 31, 2022 at 10:57:54PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Optimize the very slow "coccicheck" target to take advantage of
>> incremental rebuilding, and fix outstanding dependency problems with
>> the existing rule.
>> 
>> The rule is now faster both on the initial run as we can make better
>> use of GNU make's parallelism than the old ad-hoc combination of
>> make's parallelism combined with $(SPATCH_BATCH_SIZE) and/or the
>> "--jobs" argument to "spatch(1)".
>
> On my machine a "from scratch" 'make -j4 coccicheck' without the
> inacceptably slow 'unused.cocci' takes 9m28s, with
> SPATCH_BATCH_SIZE=32 it takes 6m39s.  If we invoke 'spatch' like in
> the patch below and let one 'spatch' invocation process all source
> files one by one (i.e. unlike the batched invocations) and using its
> '--jobs' option then it takes 4m56s.
>
> This patch series is slower than any of the above, as it takes 10m3s.

But is such from-scratch building your primary use-case?

When I e.g. do a "make coccicheck" on master, then:

	git merge gitster/ac/bitmap-lookup-table

Or whatever, it takes much less time than before, ditto things like:

	time make -W grep.h contrib/coccinelle/free.cocci.patch

I.e. changing a file and wanting to re-run it for all of its reverse
dependencies (this is with the *.o files).

>>  * Before this change running "make coccicheck" would by default end
>>    up pegging just one CPU at the very end for a while, usually as
>>    we'd finish whichever *.cocci rule was the most expensive.
>> 
>>    This could be mitigated by combining "make -jN" with
>>    SPATCH_BATCH_SIZE, see 960154b9c17 (coccicheck: optionally batch
>>    spatch invocations, 2019-05-06). But doing so required careful
>>    juggling, as e.g. setting both to 4 would yield 16 workers.
>
> As pointed out previously, this is not the case.

Yes we discussed this in v1, I just missed this in the re-roll. Sorry!
Will fix it in the next iteration.

>  include config.mak.uname
>  -include config.mak.autogen
> @@ -3139,19 +3137,17 @@ COCCI_TEST_RES = $(wildcard contrib/coccinelle/tests/*.res)
>  
>  %.cocci.patch: %.cocci $(COCCI_SOURCES)
>  	$(QUIET_SPATCH) \
> -	if test $(SPATCH_BATCH_SIZE) = 0; then \
> -		limit=; \
> -	else \
> -		limit='-n $(SPATCH_BATCH_SIZE)'; \
> -	fi; \
> -	if ! echo $(COCCI_SOURCES) | xargs $$limit \
> -		$(SPATCH) $(SPATCH_FLAGS) \
> -		--sp-file $< --patch . \
> +	filelist=$@.filelist; \
> +	: Blank lines between filenames are necessary to put each file in separate group ; \
> +	printf "%s\\n\\n" $(COCCI_SOURCES) >$$filelist && \
> +	if ! $(SPATCH) $(SPATCH_FLAGS) --jobs $(SPATCH_JOBS) \
> +		--sp-file $< --patch . --file-groups $$filelist \

Yes, I think it's unfortunate that there *are* faster ways to run it,
but they rely on doing things in batches of N, which doesn't really jive
well with make's incremental model of the world.

But I think we can either have that, or the ability to incrementally
(re)build.

Do you see some easy way to have our cake & eat it too here? If
you/others really feel strongly about this use case I could also keep
the old method, and just hide this all behind a SPATCH_INCREMENTAL=Y
flag or something. But I'm hoping we can turn this (mostly) into some
win-win.