Re: [PATCH v2] fetch-pack: try harder to read an ERR packet

Jeff King <peff@xxxxxxxx> · Tue, 28 Apr 2020 17:17:36 -0400

On Tue, Apr 28, 2020 at 02:02:33PM -0700, Junio C Hamano wrote:

> > I think the "lucky" case happens pretty routinely. The situation we're
> > trying to catch here is that server does:
> >
> >    packet_write("ERR I don't like your request for some reason");
> >    die("exiting");
> 
> OK, if we assume that the communication route is never flaky and
> everything writtten will go through before TCP shutdown that would
> happen when die() kills the process (or like test environment that
> most communication goes locally between two processes), sure, it may
> look common enough to be worth "fixing".  I simply did not realize
> that was the shared assumption behind this patch before I went back
> to the original discussion that was about a racy test.

Certainly the communication route will _sometimes_ be flaky. And there's
nothing we can do there except say "Broken pipe" or similar, whether the
other side said ERR or not. So any reads have to be speculative. But
assuming TCP is functioning, some portion of our ERR packets are simply
dropped from the incoming TCP buffers. I suspect we don't get more
reports of this in the wild (and are mostly annoyed by it in racy tests)
because:

  - ERR conditions don't happen all that often (though we have been
    adding more recently)

  - users may not realize they _could_ have gotten a good message
    (especially since historically many of these conditions did just
    involve the server hanging up)

  - I think the race may be easier to win on real networks with latency.
    Locally, if I write "want bogus" and then "done", the other side may
    process the "want" between the two and close the pipe. But if
    there's 50ms between client and server, we've usually managed to
    "done" into the systems TCP buffer, if not onto the wire, by the
    time the other side has figured out the error and gotten a FIN
    packet back to us.

> As long as an extra read on the side that just got a write error
> won't throw us into a deadlock, I think I am OK, but I am still not
> sure if the code complexity to have two write_in_full() is worth it.

Yes, I don't see the point of that.

> If the mechanism to do this were limited to the packet IO layer, it
> may be more palatable, though.

Agreed. There's not a single place where we write, though, since we
often form packets in local buffers and then write() them out manually.
So it does have to be sprinkled around fetch-pack.c. But certainly the
damage can be limited to that client network code.

-Peff