Re: [PATCH] http.c: clear the 'finished' member once we are done with it

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 24 May 2022 22:38:30 +0200

On Tue, May 24 2022, Junio C Hamano wrote:

> [...]
> This incidentally is a good illustration why the thread-starter
> patch that did
>
> 	if (&finished == slot->finished)
> 		slot->finished = NULL;
>
> would be sufficient, and the "clear only ours" guard is not
> necessary, I think.  If the inner run_active_slot() did not trigger
> a callback that adds more reuse of the slot, it will clear
> slot->finished to NULL itself, with or without the guard.  And the
> outer run_active_slot() may fail to clear if the guard is there, but
> slot->finished is NULL in that case, so there is no point in clearing
> it again.
>
> And if the inner run_active_slot() did trigger a callback that ended
> up reusing the slot, then eventually the innermost one would have
> cleared slot->finished to NULL, with or without the guard, before it
> returned the control to inner run_active_slot().  The inference goes
> the same way to show that the guard is not necessary but is not
> hurting.
>
> I _think_ we can even get away by not doing anything to
> slot->finished at the end of run_active_slot(), as we are not
> multi-threaded and the callee only returns to the caller, but if it
> helps pleasing the warning compiler, I'd prefer the simplest
> workaround, perhaps with an unconditional clearing there?

I'll admit I haven't fully looked into this again, but does anything in
the subsequent analysis suggest that my original patch wouldn't be a
working solution to this, still:
https://lore.kernel.org/git/patch-1.1-1cec367e805-20220126T212921Z-avarab@xxxxxxxxx/ ?

The advantage of it over any small and narrow fix like your hunk quoted
above is that it doesn't make the end result look as though we care
about e.g. thread races, which evidently is something more than one
person looking at this has (needlessly) ended up worrying about.

> What did I miss?  I must be missing something, as I can explain how
> the current "(*slot->finished) = 1" with "while (finished)"
> correctly works, but I cannot quite explain why the original "while
> (slot->in_use)" would not, which is annoying.

Perhaps that change was also worried about thread safety? I only briefly
re-looked at this again, but I don't think I ever found exactly what
that 2006-era fix was meant to fix, specifically.

> In other words, why we needed baa7b67d (HTTP slot reuse fixes,
> 2006-03-10) in the first place?  It is possible that we had some
> code paths that forgot to drop in_use before the inner run_active
> returned that have been fixed in the 16 years and this fix was
> hiding that bug, but I dunno.

I haven't found that out either, either back in January or just now.