Re: [PATCH 1/1] upload-pack: fix race condition in error messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/28/2019 12:15 PM, SZEDER Gábor wrote:
> On Wed, Aug 28, 2019 at 11:39:44AM -0400, Jeff King wrote:
>> On Wed, Aug 28, 2019 at 10:54:12AM -0400, Jeff King wrote:
>>
>>>> Unfortunately, however, while running './t5516-fetch-push.sh -r 1,79
>>>> --stress' to try to reproduce a failure caused by those mingled
>>>> messages, the same check only failed for a different reason so far
>>>> (both on Linux and macOS (on Travis CI)):
>>>
>>> There's some hand-waving argument that this should be race-free in
>>> 014ade7484 (upload-pack: send ERR packet for non-tip objects,
>>> 2019-04-13), but I am not too surprised if there is a flaw in that
>>> logic.
>>
>> By the way, I've not been able to reproduce this locally after ~10
>> minutes of running "./t5516-fetch-push.sh -r 1,79 --stress" on my Linux
>> box. I wonder what's different.
>>
>> Are you running the tip of master?
> 
> Yeah, but this seems to be one of those "you have to be really lucky,
> even with --stress" cases.
> 
> So...  I was away for keyboard for over an hour and let it run on
> 'master', but it didn't fail.  Then I figured that I give it a try
> with Derrick's patch, because, well, why not, and then I got this
> broken pipe error in ~150 repetitions.  Run it again, same error after
> ~200 reps.  However, I didn't understand how that patch could lead to
> broken pipe, so went back to stressing master...  nothing.  So I
> started writing the reply to that patch saying that it seems to cause
> some racy failures on Linux, and was already proofreading before
> sending when the damn thing finally did fail.  Oh, well.
> 
> Then tried it on macOS, and it failed fairly quickly.  For lack of
> better options I used Travis CI's debug shell to access a mac VM, and
> could reproduce the failure both with and without the patch before it
> timeouted.

I'm running these tests under --stress now, but not seeing the error
you saw.

However, I do have a theory: the process exits before flushing the
packet line. Adding this line before exit(1) should fix it:

	packet_writer_flush(writer);

I can send this in a v2, but it would be nice if you could test this
in your environment that already demonstrated the failure.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux