Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 14 2019, Junio C Hamano wrote:

> "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes:
>
>> Unfortunately, subtest 13 still hangs on NonStop, even with this
>> patch, so our Pipeline still hangs. I'm glad it's better on Azure,
>> but I don't think this actually addresses the root cause of the
>> hang.
>
> Sigh.
>
>> possible this is not the test that is failing, but actually the
>> git-http-backend? The code is not in a loop, if that helps. It is
>> not consuming any significant cycles. I don't know that part of
>> the code at all, sadly. The code is here:
>>
>> * in the operating system from here up *
>>   cleanup_children + 0x5D0 (UCr)
>>   cleanup_children_on_exit + 0x70 (UCr)
>>   git_atexit_dispatch + 0x200 (UCr)
>>   __process_atexit_functions + 0xA0 (DLL zcredll)
>>   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
>>   exit + 0x2A0 (DLL zcrtldll)
>>   die_webcgi + 0x240 (UCr)
>>   die_errno + 0x360 (UCr)
>>   write_or_die + 0x1C0 (UCr)
>>   end_headers + 0x1A0 (UCr)
>>   die_webcgi + 0x220 (UCr)
>>   die + 0x320 (UCr)
>>   inflate_request + 0x520 (UCr)
>>   run_service + 0xC20 (UCr)
>>   service_rpc + 0x530 (UCr)
>>   cmd_main + 0xD00 (UCr)
>>   main + 0x190 (UCr)
>>
>> Best guess is that a signal (SIGCHLD?) is possibly getting eaten
>> or neglected somewhere between the test, perl, and
>> git-http-backend.
>
> So we are trying to die(), which actually happens in die_webcgi(),
> and then try to write some message _but_ notice an error inside
> write_or_dir() and try to exit because we do not want to recurse
> forever trying to die, giving a message to say how/why we died, and
> die because failing to give that message, forever.
>
> But in our attempt to exit(), we try to "cleanup children" and that
> is what gets stuck.

I have not paid enough attention to this thread to say if this is dumb,
but just in case it's useful. For this class of problem where cleanup
bites you for whatever reason in Perl, you can sometimes use this:

    use POSIX ();
    POSIX::_exit($code);

This will call "exit" from "stdlib" instead of Perl's "exit". So go away
*now* and let the OS deal with the mess. Perl's will run around cleaning
up stuff, freeing memory, running destructors etc, all of which might
have side effects you don't want/care about, and might (as maybe in this
case?) cause some hang.

> One big difference before and after the /dev/zero change is that the
> process is now on a downstream of the pipe.  If we prepare a large
> file with a finite size full of NULs and replace /dev/null with it,
> instead of feeding NULs from the pipe, would it change the equation?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux