Re: [PATCH] upload-pack: allow stateless client EOF just prior to haves

Daniel Duvall <dan@xxxxxxxxx> · Fri, 30 Oct 2020 00:47:29 -0700

Thanks for the welcome! I appreciate the lengthy feedback, because
above all else I'd love to understand the low-level workings of git
better than I do now.

On Thu, Oct 29, 2020 at 9:40 PM Jeff King <peff@xxxxxxxx> wrote:
>
> On Thu, Oct 29, 2020 at 06:40:59PM -0700, Daniel Duvall wrote:
>
> > During stateless packfile negotiation, it is normal behavior for
> > stateless RPC clients (e.g. git-remote-curl) to send multiple
> > upload-pack requests with the first containing only the
> > wants/shallows/deepens/filters followed by a flush.
>
> Hmm, is this normal? I'd expect it to send a "done" after the flush, and
> indeed that's what happens if I try it. I see:
>
> E.g., here's GIT_TRACE_CURL output from a v0 request (fetching into an
> empty repo):
>
>  Send data: 00a8want 4b379b7a1b97790f805c365b1e58b75b70d3d904 multi_ack_
>  Send data: detailed no-done side-band-64k thin-pack ofs-delta deepen-si
>  Send data: nce deepen-not agent=git/2.29.2.477.g2cec8aa0af.00000009done
>  Send data: .
>  Info: upload completely sent off: 181 out of 181 bytes
>
> However, if I add --depth 1 to my fetch, I get:
>
>   Send data: 00a8want 4b379b7a1b97790f805c365b1e58b75b70d3d904 multi_ack_
>   Send data: detailed no-done side-band-64k thin-pack ofs-delta deepen-si
>   Send data: nce deepen-not agent=git/2.29.2.477.g2cec8aa0af.000cdeepen 1
>   Send data: 0000
>   Info: upload completely sent off: 184 out of 184 bytes
>
> Which maybe makes sense if we need the shallow response from the server
> to determine the next step of the request. It doesn't matter for my
> trivial case here (we end up resending the same request with a "done"
> added), but I guess it could in other cases.

Right. This is what I was observing too—in a trivial case, the same
roundtrip being made again with nothing additional but "done". I
should have clarified that I thought it was normal only when a depth
was provided. I've been learning the packfile negotiation protocols as
I've been debugging this issue, so I probably should have reserved my
assertion of "normal" for when I have a firm grasp of them. :)

>
> > When run in stateless mode, continuing on without first checking that
> > the client request has reached EOF can result in a bad file descriptor
> > during get_common_commits.
>
> When you say "bad file descriptor" that makes me think we're getting
> EBADF after trying to use a closed descriptor. But we'd die() as soon as
> the pkt-line code tries to read and gets eof, right?

Right. It's still a valid file descriptor but a premature die() upon
EOF. I'll clarify that.

>
> That's also worth fixing, but I want to make sure I understand the
> problem completely. I think this part of the commit message would be a
> good place to talk about the real-world effects:
>
>   - the client doesn't care; by definition it has hung up at this point
>     and will keep going with its next request

That's true in the case where the server doesn't surface the non-zero
exit code from git-upload-pack—which results in git-http-backend
existing non-zero as well. An apache setup I used in testing doesn't
seem to care about the failure—it responds 200 and so the client is
happy—but in our (dayjob) case, we're running a Phabricator instance
which handles a non-zero exit from git-http-backend by responding 500,
and the client dies.

$ git fetch --depth 1
https://phabricator.wikimedia.org/source/phabricator.git HEAD
error: RPC failed; HTTP 500 curl 22 The requested URL returned error: 500
fatal: the remote end hung up unexpectedly

>
>   - likewise, the server doesn't care in terms of its response; by
>     definition it is stateless, so the next request the client makes
>     will start fresh

That makes sense to me.

>
>   - it is annoying for server admins who get a bunch of useless logs
>     with "remote end hung up unexpected", or if they are tracking exit
>     codes from upload-pack

Yes! Phabricator tries to work around this by detecting this stderr
output with a regex, which doesn't seem to hold up too well over time.

See https://secure.phabricator.com/D21484

This is actually how I started down this rabbithole. It seemed odd to
me that git-http-backend was exiting non-zero in the first place. Then
again, I didn't much understand packfile negotiation, so it was just a
hunch that there was some kind of bug. Knowing more about the protocol
now, it definitely seems buggy.

>
> As somebody who has admin'd a busy git site, unexpected client network
> drops are just a fact of life and you have to look past them in your
> logs. But I still think it's worth keeping it as uncluttered as possible
> and having upload-pack handle this without an error message.
>
> > Instead, upload-pack should gently peek for an EOF between the sending
> > of shallow/unshallow lines (followed by flush) and the reading of client
> > haves. If the client has hung up at this point, exit normally.
>
> Should we do this only if we saw a deepen line? From my reading of the
> client code, that's the only thing that would cause this early request.
> I don't know if there's any particular advantage to being more strict
> here.

I'm not sure what would be more correct. It seems to go against the
"server doesn't care" in a stateless case to be that strict, but maybe
there are additional benefits to strictness I'm not aware of.

>
> If we're _not_ going to be strict, then I actually wonder if we ought to
> simply teach get_common_commits() that seeing an EOF is OK in stateless
> mode, wherever it comes. It can't possibly impact the correctness of the
> protocol conversation (since we're stateless and the client is gone),
> but maybe it's useful if you're trying to count how often clients really
> do hang up.

I originally took that approach, but gently handling an EOF in the
get_common_commits loop resulted in a NAK being sent back because of:

                if (packet_reader_read(reader) != PACKET_READ_NORMAL) {
                        [...]
                        if (data->have_obj.nr == 0 || data->multi_ack)
                                packet_write_fmt(1, "NAK\n");
                        [...]
                        if (data->stateless_rpc)
                                exit(0);
                        [...]
                }

which the client died on with an "expected shallow list" message. I
didn't see a straightforward way of modifying the conditions _inside_
the loop while ensuring I wasn't changing any expected behavior upon
EOF.

>
> > --- /dev/null
> > +++ b/t/t9904-upload-pack-stateless-timely-eof.sh
>
> We usually try to group related tests by number. Maybe t5705 would be a
> better spot? I also wondered if this could go into t5704, but its title
> is "protocol violations". It's not clear to me yet if this is a
> violation that happens to be mostly harmless, or something we need to be
> doing. :)

Ah! Okay, I was wondering about that, whether the numbered prefixes
were serial or otherwise meaningful. I'll try to group it
appropriately.

>
> > +test_expect_success 'upload-pack outputs flush and exits ok' '
> > +     test_commit initial &&
> > +     head=$(git rev-parse HEAD) &&
> > +     hexsz=$(test_oid hexsz) &&
> > +
> > +     printf "%04xwant %s\n%04xshallow %s\n000ddeepen 1\n0000" \
> > +             $(($hexsz + 10)) $head $(($hexsz + 13)) $head >request &&
>
> We have a helper function that makes this a bit easier to read:
>
> diff --git a/t/t9904-upload-pack-stateless-timely-eof.sh b/t/t9904-upload-pack-stateless-timely-eof.sh
> index f8385a7ebd..1108401e8f 100755
> --- a/t/t9904-upload-pack-stateless-timely-eof.sh
> +++ b/t/t9904-upload-pack-stateless-timely-eof.sh
> @@ -9,10 +9,13 @@ D=$(pwd)
>  test_expect_success 'upload-pack outputs flush and exits ok' '
>         test_commit initial &&
>         head=$(git rev-parse HEAD) &&
> -       hexsz=$(test_oid hexsz) &&
>
> -       printf "%04xwant %s\n%04xshallow %s\n000ddeepen 1\n0000" \
> -               $(($hexsz + 10)) $head $(($hexsz + 13)) $head >request &&
> +       {
> +               packetize "want $head" &&
> +               packetize "shallow $head" &&
> +               packetize "deepen 1" &&
> +               printf "0000"
> +       } >request &&
>
>         git upload-pack --stateless-rpc "$(pwd)" <request >actual &&
>
>
> > +     git upload-pack --stateless-rpc "$(pwd)" <request >actual &&
>
> You can just use "." here, which is a little shorter. It's not entirely
> cosmetic; the difference between $PWD and $(pwd) on Windows always trips
> me up, so I try to avoid using either whenever I can. ;)

Very cool re: the helpers. I'll make those changes.

>
> It would be nice if we could test this through a real use of Git, but it
> might not be worth the hassle. I guess we'd have to mine the apache logs
> in one of our http test scripts to see if upload-pack failed. And I
> guess if we _do_ change the client side to stop sending the extra
> request but want to treat historical clients more gracefully, we'd still
> need a manual test like this.

Yeah I just went with the minimal case I'm familiar with, but I'm game
to set up a more thorough case with some guidance.

>
> > --- a/upload-pack.c
> > +++ b/upload-pack.c
> > @@ -1344,7 +1344,18 @@ void upload_pack(struct upload_pack_options *options)
> >                                  PACKET_READ_DIE_ON_ERR_PACKET);
>
> The actual fix looks correct to me, modulo the alternatives I raised
> earlier.
>
> This function only handles the v0 protocol. For v2, we end up in
> upload_pack_v2(). But my reading of the client side do_fetch_pack_v2()
> is that it _doesn't_ send this extra request. And a simple test seems to
> confirm it. Which gives me further pause as to whether the extra request
> is necessary for v0.

When I do a trace using v2, I see two roundtrip requests as well. I
haven't tested the exit status of git-upload-pack in that case
however. It's getting late for me but I'll investigate tomorrow.

>
>
> Well, that review ended up a bit longer than I had imagined. So let me
> add what I should have said at the top: welcome to the Git list, and
> thanks for looking into this issue. :)

Thanks again! I appreciate all the feedback.