On Sat, Mar 06 2021, Jeff King wrote: > On Fri, Mar 05, 2021 at 06:07:22PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> Using --no-includes also fixes a subtle bug introduced in >> 960154b9c17 (coccicheck: optionally batch spatch invocations, >> 2019-05-06) with duplicate hunks being written to the >> generated *.patch files. >> >> This is because that change altered a file-at-a-time for-loop to an >> invocation of "xargs -n X". This would not matter for most other >> programs, but it matters for spatch. >> >> This is because each spatch invocation will maintain shared lock files >> in /tmp, check if files being parsed were changed etc. I haven't dug >> into why exactly, but it's easy to reproduce the issue[2]. The issue >> goes away entirely if we just use --no-includes, which as noted above >> would have made sense even without that issue. > > This part still doesn't make any sense to me. If we are running with > SPATCH_BATCH_SIZE=1, which is the default, then "xargs -n" is still > going to run it in file-at-a-time mode. From spatch's perspective, > there's no difference between a for-loop and "xargs -n 1" (unless it > somehow cares about stdin, but it shouldn't). > > Using strace, I do see it creating files in /tmp, but they are all named > after the process id, and cleaned up before exit. So I don't see how > they could interfere with each other (certainly not in a sequential run, > but even if you were to use "xargs -P" to get parallel runs, they seem > distinct). > > If we increase the batch size, I'd expect _fewer_ duplicates. Because in > file-at-a-time mode with --all-includes, wouldn't every file that > mentions an include possibly end up emitting a patch for it? > > The results you show here (which do replicate for me) imply something > much weirder is going on: > >> with xargs -n 1: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> with xargs -n 2: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> with xargs -n 4: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c > > These results are wrong! They are not finding the entry in strbuf.h that > should be changed. > >> with xargs -n 16: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> 2 +++ b/strbuf.h >> with xargs -n 64: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> 2 +++ b/strbuf.h >> with xargs -n 128: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> 2 +++ b/strbuf.h > > These ones are also wrong. Now we find the strbuf.h mention, but we are > finding it twice. > >> with xargs -n 512: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> 1 +++ b/strbuf.h > > And this one, which is given all of the paths in one invocation gets it > right. I'd expect that over the "128" version, which is running two > independent spatch invocations. But the fact that "64" and "16" produce > exactly two duplicates makes little sense. And even less that 1, 2, and > 4 don't find the header change at all. Yes, I don't know what's going on with spatch. Just the observed results of duplication before/after youc commit. > Running the same test with a for loop produces the same (wrong) results > as "-n 1", as expected: > > $ for i in *.c; do > spatch --sp-file test.cocci --all-includes --patch . $i 2>/dev/null > done | grep -F +++ | sort | uniq -c > 1 +++ b/convert.c > 1 +++ b/strbuf.c > > So in short I have no idea what spatch is doing. But I remain > unconvinced that there is anything wrong with the batch-size patch. Right, but that's an unrelated "didn't change this thing it should have" bug (which we had before), not the duplicate hunk bug. > Running it like your patch will, feeding the header directly along with > --no-includes, does find the correct result: > > $ for i in *.c *.h; do > spatch --sp-file test.cocci --no-includes --patch . $i 2>/dev/null > done | grep -F +++ | sort | uniq -c > 1 +++ b/convert.c > 1 +++ b/strbuf.c > 1 +++ b/strbuf.h > > though I am still concerned by René's example which is missing more > results. *nod*. I just sent a patch on top to fix that, although it undoes most of the speedup in this series.