Re: [ANNOUNCE] Git v2.33.0-rc2 (Build/Test Report)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 16, 2021 at 02:54:14PM -0400, Randall S. Becker wrote:

> >That 60 seconds is the timeout from t5562/invoke-with-content-length.
> >
> >So one, are you sure it's hanging forever, and not just for 60 seconds?
> 
> Absolutely sure. 48 hours because I forgot to check.
> 
> >And two, it is quite obvious there's some racing here. I'm not sure if this is indicative of a problem in the test suite, or in http-backend
> >itself (in which case it could be affecting real users).
> 
> How can I help track this down?

Here's what I found out so far. For my 60-second lag case, the test
_does_ complete as expected; it just takes a long time. So I think what
happens is this:

  - the invoke-with-content-length script sets up a SIGCLD handler

  - then it kicks off http-backend and writes to it

  - then it sleeps for 60 seconds, assuming that SIGCLD will interrupt
    the sleep

  - after the sleep finishes (whether by 60 seconds or because it was
    interrupted by the signal), we check a flag to see if our SIGCLD
    handler was called. If not, then we complain.

This usually completes instantaneously-ish, because the signal
interrupts our sleep. But very occasionally the child process dies
_before_ we hit the sleep, so we don't realize it.

So ideally we'd have some way of atomically checking our flag and then
sleeping only if it's not set. But I don't think that exists. The
closest we can come is using a series of smaller sleeps and checks. And
indeed, digging in the archive shows that Max already proposed such a
patch:

  https://lore.kernel.org/git/20190218205028.32486-1-max@xxxxxxxxxx/

It looks like it feel through the cracks, though. Maybe now is a good
time to resurrect it.

However, you are in that thread, too, and it didn't help your situation.
So I think your race is somehow different. It looks like there was some
weirdness around close() for you, though generally we _shouldn't_ be
hitting that close() at all, because we'd have gotten SIGCLD and set the
$exited flag in the interim.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux