On Sat, Jan 09, 2021 at 05:48:10AM -0500, Jeff King wrote: > On Sat, Jan 09, 2021 at 05:34:05AM -0500, Jeff King wrote: > > > For this _particular_ test, since we know that it is testing a v0-only > > behavior, we might want to just loosen the test. This goes against the > > point of adding it in 014ade7484 (upload-pack: send ERR packet for > > non-tip objects, 2019-04-13), but it's the best we can do for now. > > Something like this: > > Since this issue has been languishing for a while now with several > "something like this" patches, I've packaged it up into something more > palatable. I think we should just apply this and move on. We may still > run into other similar races, but I don't think this one is worth > spending more mental effort on. > > -- >8 -- > Subject: [PATCH] t5516: loosen "not our ref" error check > > Commit 014ade7484 (upload-pack: send ERR packet for non-tip objects, > 2019-04-13) added a test that greps the output of a failed fetch to make > sure that upload-pack sent us the ERR packet we expected. But checking > this is racy; despite the argument in that commit, the client may still > be sending a "done" line when the server exits, causing it to die() on a Nit: I think using the word "after" would make the problematic sequence of events a tad clearer, i.e. "... after the server has exited, ...". > failed write() and never see the ERR packet at all. > > This fails quite rarely on Linux, but more often on macOS. However, it > can be triggered reliably with: > > diff --git a/fetch-pack.c b/fetch-pack.c > index 876f90c759..cf40de9092 100644 > --- a/fetch-pack.c > +++ b/fetch-pack.c > @@ -489,6 +489,7 @@ static int find_common(struct fetch_negotiator *negotiator, > done: > trace2_region_leave("fetch-pack", "negotiation_v0_v1", the_repository); > if (!got_ready || !no_done) { > + sleep(1); > packet_buf_write(&req_buf, "done\n"); > send_request(args, fd[1], &req_buf); > } FWIW (not much?), I've run the test suite with that sleep(1) in place, and there were no other test failures. > This is a real user-visible race that it would be nice to fix, but it's > tricky to do so: the client would have to speculatively try to read an > ERR packet after hitting a write() error. And at least for this error, > it's specific to v0 (since v2 does not enforce reachability at all). > > So let's loosen to test to avoid annoying racy failures. If we > eventually do the read-after-failed-write thing, we can tighten it. And > if not, v0 will grow increasingly obsolete as servers support v2, so the > utility of this test will decrease over time anyway. Makes sense. Back then when I investigated this issue the default protocol was still v0; now that we default to v2 I agree its better to work around the issue in the test instead of "fixing" the root cause with that "trying to read ERR packet on error" hack. Good, a year-and-a-half old entry checked off from my todo list :) Thanks.