On 8/28/2019 12:15 PM, SZEDER Gábor wrote: > On Wed, Aug 28, 2019 at 11:39:44AM -0400, Jeff King wrote: >> On Wed, Aug 28, 2019 at 10:54:12AM -0400, Jeff King wrote: >> >>>> Unfortunately, however, while running './t5516-fetch-push.sh -r 1,79 >>>> --stress' to try to reproduce a failure caused by those mingled >>>> messages, the same check only failed for a different reason so far >>>> (both on Linux and macOS (on Travis CI)): >>> >>> There's some hand-waving argument that this should be race-free in >>> 014ade7484 (upload-pack: send ERR packet for non-tip objects, >>> 2019-04-13), but I am not too surprised if there is a flaw in that >>> logic. >> >> By the way, I've not been able to reproduce this locally after ~10 >> minutes of running "./t5516-fetch-push.sh -r 1,79 --stress" on my Linux >> box. I wonder what's different. >> >> Are you running the tip of master? > > Yeah, but this seems to be one of those "you have to be really lucky, > even with --stress" cases. > > So... I was away for keyboard for over an hour and let it run on > 'master', but it didn't fail. Then I figured that I give it a try > with Derrick's patch, because, well, why not, and then I got this > broken pipe error in ~150 repetitions. Run it again, same error after > ~200 reps. However, I didn't understand how that patch could lead to > broken pipe, so went back to stressing master... nothing. So I > started writing the reply to that patch saying that it seems to cause > some racy failures on Linux, and was already proofreading before > sending when the damn thing finally did fail. Oh, well. > > Then tried it on macOS, and it failed fairly quickly. For lack of > better options I used Travis CI's debug shell to access a mac VM, and > could reproduce the failure both with and without the patch before it > timeouted. I'm running these tests under --stress now, but not seeing the error you saw. However, I do have a theory: the process exits before flushing the packet line. Adding this line before exit(1) should fix it: packet_writer_flush(writer); I can send this in a v2, but it would be nice if you could test this in your environment that already demonstrated the failure. Thanks, -Stolee