Thanks for the welcome! I appreciate the lengthy feedback, because above all else I'd love to understand the low-level workings of git better than I do now. On Thu, Oct 29, 2020 at 9:40 PM Jeff King <peff@xxxxxxxx> wrote: > > On Thu, Oct 29, 2020 at 06:40:59PM -0700, Daniel Duvall wrote: > > > During stateless packfile negotiation, it is normal behavior for > > stateless RPC clients (e.g. git-remote-curl) to send multiple > > upload-pack requests with the first containing only the > > wants/shallows/deepens/filters followed by a flush. > > Hmm, is this normal? I'd expect it to send a "done" after the flush, and > indeed that's what happens if I try it. I see: > > E.g., here's GIT_TRACE_CURL output from a v0 request (fetching into an > empty repo): > > Send data: 00a8want 4b379b7a1b97790f805c365b1e58b75b70d3d904 multi_ack_ > Send data: detailed no-done side-band-64k thin-pack ofs-delta deepen-si > Send data: nce deepen-not agent=git/2.29.2.477.g2cec8aa0af.00000009done > Send data: . > Info: upload completely sent off: 181 out of 181 bytes > > However, if I add --depth 1 to my fetch, I get: > > Send data: 00a8want 4b379b7a1b97790f805c365b1e58b75b70d3d904 multi_ack_ > Send data: detailed no-done side-band-64k thin-pack ofs-delta deepen-si > Send data: nce deepen-not agent=git/2.29.2.477.g2cec8aa0af.000cdeepen 1 > Send data: 0000 > Info: upload completely sent off: 184 out of 184 bytes > > Which maybe makes sense if we need the shallow response from the server > to determine the next step of the request. It doesn't matter for my > trivial case here (we end up resending the same request with a "done" > added), but I guess it could in other cases. Right. This is what I was observing too—in a trivial case, the same roundtrip being made again with nothing additional but "done". I should have clarified that I thought it was normal only when a depth was provided. I've been learning the packfile negotiation protocols as I've been debugging this issue, so I probably should have reserved my assertion of "normal" for when I have a firm grasp of them. :) > > > When run in stateless mode, continuing on without first checking that > > the client request has reached EOF can result in a bad file descriptor > > during get_common_commits. > > When you say "bad file descriptor" that makes me think we're getting > EBADF after trying to use a closed descriptor. But we'd die() as soon as > the pkt-line code tries to read and gets eof, right? Right. It's still a valid file descriptor but a premature die() upon EOF. I'll clarify that. > > That's also worth fixing, but I want to make sure I understand the > problem completely. I think this part of the commit message would be a > good place to talk about the real-world effects: > > - the client doesn't care; by definition it has hung up at this point > and will keep going with its next request That's true in the case where the server doesn't surface the non-zero exit code from git-upload-pack—which results in git-http-backend existing non-zero as well. An apache setup I used in testing doesn't seem to care about the failure—it responds 200 and so the client is happy—but in our (dayjob) case, we're running a Phabricator instance which handles a non-zero exit from git-http-backend by responding 500, and the client dies. $ git fetch --depth 1 https://phabricator.wikimedia.org/source/phabricator.git HEAD error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500 fatal: the remote end hung up unexpectedly > > - likewise, the server doesn't care in terms of its response; by > definition it is stateless, so the next request the client makes > will start fresh That makes sense to me. > > - it is annoying for server admins who get a bunch of useless logs > with "remote end hung up unexpected", or if they are tracking exit > codes from upload-pack Yes! Phabricator tries to work around this by detecting this stderr output with a regex, which doesn't seem to hold up too well over time. See https://secure.phabricator.com/D21484 This is actually how I started down this rabbithole. It seemed odd to me that git-http-backend was exiting non-zero in the first place. Then again, I didn't much understand packfile negotiation, so it was just a hunch that there was some kind of bug. Knowing more about the protocol now, it definitely seems buggy. > > As somebody who has admin'd a busy git site, unexpected client network > drops are just a fact of life and you have to look past them in your > logs. But I still think it's worth keeping it as uncluttered as possible > and having upload-pack handle this without an error message. > > > Instead, upload-pack should gently peek for an EOF between the sending > > of shallow/unshallow lines (followed by flush) and the reading of client > > haves. If the client has hung up at this point, exit normally. > > Should we do this only if we saw a deepen line? From my reading of the > client code, that's the only thing that would cause this early request. > I don't know if there's any particular advantage to being more strict > here. I'm not sure what would be more correct. It seems to go against the "server doesn't care" in a stateless case to be that strict, but maybe there are additional benefits to strictness I'm not aware of. > > If we're _not_ going to be strict, then I actually wonder if we ought to > simply teach get_common_commits() that seeing an EOF is OK in stateless > mode, wherever it comes. It can't possibly impact the correctness of the > protocol conversation (since we're stateless and the client is gone), > but maybe it's useful if you're trying to count how often clients really > do hang up. I originally took that approach, but gently handling an EOF in the get_common_commits loop resulted in a NAK being sent back because of: if (packet_reader_read(reader) != PACKET_READ_NORMAL) { [...] if (data->have_obj.nr == 0 || data->multi_ack) packet_write_fmt(1, "NAK\n"); [...] if (data->stateless_rpc) exit(0); [...] } which the client died on with an "expected shallow list" message. I didn't see a straightforward way of modifying the conditions _inside_ the loop while ensuring I wasn't changing any expected behavior upon EOF. > > > --- /dev/null > > +++ b/t/t9904-upload-pack-stateless-timely-eof.sh > > We usually try to group related tests by number. Maybe t5705 would be a > better spot? I also wondered if this could go into t5704, but its title > is "protocol violations". It's not clear to me yet if this is a > violation that happens to be mostly harmless, or something we need to be > doing. :) Ah! Okay, I was wondering about that, whether the numbered prefixes were serial or otherwise meaningful. I'll try to group it appropriately. > > > +test_expect_success 'upload-pack outputs flush and exits ok' ' > > + test_commit initial && > > + head=$(git rev-parse HEAD) && > > + hexsz=$(test_oid hexsz) && > > + > > + printf "%04xwant %s\n%04xshallow %s\n000ddeepen 1\n0000" \ > > + $(($hexsz + 10)) $head $(($hexsz + 13)) $head >request && > > We have a helper function that makes this a bit easier to read: > > diff --git a/t/t9904-upload-pack-stateless-timely-eof.sh b/t/t9904-upload-pack-stateless-timely-eof.sh > index f8385a7ebd..1108401e8f 100755 > --- a/t/t9904-upload-pack-stateless-timely-eof.sh > +++ b/t/t9904-upload-pack-stateless-timely-eof.sh > @@ -9,10 +9,13 @@ D=$(pwd) > test_expect_success 'upload-pack outputs flush and exits ok' ' > test_commit initial && > head=$(git rev-parse HEAD) && > - hexsz=$(test_oid hexsz) && > > - printf "%04xwant %s\n%04xshallow %s\n000ddeepen 1\n0000" \ > - $(($hexsz + 10)) $head $(($hexsz + 13)) $head >request && > + { > + packetize "want $head" && > + packetize "shallow $head" && > + packetize "deepen 1" && > + printf "0000" > + } >request && > > git upload-pack --stateless-rpc "$(pwd)" <request >actual && > > > > + git upload-pack --stateless-rpc "$(pwd)" <request >actual && > > You can just use "." here, which is a little shorter. It's not entirely > cosmetic; the difference between $PWD and $(pwd) on Windows always trips > me up, so I try to avoid using either whenever I can. ;) Very cool re: the helpers. I'll make those changes. > > It would be nice if we could test this through a real use of Git, but it > might not be worth the hassle. I guess we'd have to mine the apache logs > in one of our http test scripts to see if upload-pack failed. And I > guess if we _do_ change the client side to stop sending the extra > request but want to treat historical clients more gracefully, we'd still > need a manual test like this. Yeah I just went with the minimal case I'm familiar with, but I'm game to set up a more thorough case with some guidance. > > > --- a/upload-pack.c > > +++ b/upload-pack.c > > @@ -1344,7 +1344,18 @@ void upload_pack(struct upload_pack_options *options) > > PACKET_READ_DIE_ON_ERR_PACKET); > > The actual fix looks correct to me, modulo the alternatives I raised > earlier. > > This function only handles the v0 protocol. For v2, we end up in > upload_pack_v2(). But my reading of the client side do_fetch_pack_v2() > is that it _doesn't_ send this extra request. And a simple test seems to > confirm it. Which gives me further pause as to whether the extra request > is necessary for v0. When I do a trace using v2, I see two roundtrip requests as well. I haven't tested the exit status of git-upload-pack in that case however. It's getting late for me but I'll investigate tomorrow. > > > Well, that review ended up a bit longer than I had imagined. So let me > add what I should have said at the top: welcome to the Git list, and > thanks for looking into this issue. :) Thanks again! I appreciate all the feedback.