On Thu, May 7, 2020 at 1:36 PM Jeff King <peff@xxxxxxxx> wrote: > > On Thu, May 07, 2020 at 11:58:29AM +0200, Christian Couder wrote: > > > I am not sure why upload_pack_v2() is called twice. Also it seems > > that another similar issue might not be fixed by this. So this patch > > is RFC for now. > > It's because of the request/response model of v2. The client side of the > conversation looks something like: > > - client connects transport to server > > - client issues command=ls-refs to get the ref advertisement > > - client issues command=fetch and sends wants/have; the server may > respond with a pack when the negotiation is finished, but may also > respond with acks, etc, asking for more rounds of negotiation > > - if there's no pack, the client then issues another command=fetch, > repeating until we get a pack > > In stateless git-over-http, each of those request/response pairs happens in > its own server-side process, because the webserver kicks off a separate > CGI for each request. But v2 adapts git://, ssh://, and direct-pipe > connections to use the same request/response model, but all connected to > a single Git process on the other side. > > So each upload_pack_v2() call needs to start with a clean slate; it > can't assume anything about previous rounds, and it needs to clean up > any allocated data from those rounds. > > So your patch is going in the right direction, but we already have other > similar data handled by upload_pack_data, which has its own init/clear > functions. Probably the filter data should go in there, too. Thank you for the detailed explanations! It looks like 'struct upload_pack_data' is indeed used by upload_pack_v2(). It's not used by upload_pack() though. So if I move 'struct list_objects_filter_options filter_options' there, I would need to also make upload_pack() either have its own filter_options or use 'struct upload_pack_data' too. > This is an easy bug to introduce, and I suspect we'll see equivalent > ones in the future (if there aren't already more lurking; there's a lot > of global data in Git, and I really have no idea what subtle > interactions one could see). One thing we could do is to truly run each > v2 command request in its own process, just like git-over-http does. But > there are a lot of complications there around holding the pipe/socket > open, how the parent v2 serving process interacts with the child > requests (e.g., noticing errors), etc. So I'd worry that going that > direction would be just as likely to introduce bugs as fix them. I agree that it seems better to just get rid of the global data used by the upload pack functions. > > diff --git a/upload-pack.c b/upload-pack.c > > index 902d0ad5e1..4e22e46294 100644 > > --- a/upload-pack.c > > +++ b/upload-pack.c > > @@ -68,7 +68,6 @@ static const char *pack_objects_hook; > > static int filter_capability_requested; > > static int allow_filter; > > static int allow_ref_in_want; > > -static struct list_objects_filter_options filter_options; > > Just looking at nearby bits of global data that might want similar > treatment: > > - allow_filter and allow_ref_in_want are set by the server-side config. > So they're persisted between upload_pack_v2() calls, but we'd > overwrite them by calling git_config() in each incarnation > > - filter_capability_requested is set based on client input, but it's > only used in v1 > > But boy there are a lot of global variables there to trace through. Yeah, unfortunately. > Probably a more fruitful way to find similar problems is to look at the > v2 process_args() to see what it touches. It looks like options like > ofs_delta probably ought to be part of upload_pack_data, too. We didn't > notice because it would require a client doing something like: > > command=fetch > ofs-delta [claims to support ofs-delta] > 0001 > [wants and haves that don't result in a pack yet...] > 0000 > command=fetch > [do not claim to support ofs-delta!] > 0001 > [wants and haves that result in a pack] > 0000 > > The server _shouldn't_ use ofs-delta there (and wouldn't over http, > where the stateless server side that generates the pack knows only about > that second request). But in git-over-ssh, it would. > > A client who does this is stupid and wrong (and I didn't check the > protocol spec, but it could very well be violating the spec). So I don't > think it's _that_ big a deal. But it would be nice if all of those > globals got moved into upload_pack_data, and both v1 and v2 just used it > consistently. Unfortunately as I discuss a bit above 'struct upload_pack_data' is not used by v1, only by v2. I think making v1 use upload_pack_data might be nice, but it's a separate issue. So I am tempted to just improve the commit message, adding some information from you and from this discussion, and then re-sending. Another intermediate solution would be to add filter_options to 'struct upload_pack_data' for v2 and to use filter_options directly for v1.