On Mon, Oct 24, 2016 at 8:29 PM, Jeff King <peff@xxxxxxxx> wrote: > I'm looking into the oft-discussed idea of reducing the size of ref > advertisements by having the client say "these are the refs I'm > interested in". Let's set aside the protocol complexities for a > moment and imagine we magically have some way to communicate a set of > patterns to the server. > > What should those patterns look like? > > I had hoped that we could keep most of the pattern logic on the > client-side. Otherwise we risk incompatibilities between how the client > and server interpret a pattern. I had also hoped we could do some kind > of prefix-matching, which would let the server look only at the > interesting bits of the ref tree (so if you don't care about > refs/changes, and the server has some ref storage that is hierarchical, > they can literally get away without opening that sub-tree). > > The patch at the end of this email is what I came up with in that > direction. It obviously won't compile without the twenty other patches > implementing transport->advertise_prefixes Yes! git-upload-pack-2 is making a come back, one form or another. > but it gives you a sense of what I'm talking about. > > Unfortunately it doesn't work in all cases, because refspec sources may > be unqualified. If I ask for: > > git fetch $remote master:foo > > then we have to actually dwim-resolve "master" from the complete list of > refs we get from the remote. It could be "refs/heads/master", > "refs/tags/master", etc. Worse, it could be "refs/master". In that case, > at least, I think we are OK because we avoid advertising refs directly > below "refs/" in the first place. But if you have a slash, like: > > git fetch $remote jk/foo > > then that _could_ be "refs/jk/foo". Likewise, we cannot even optimize > the common case of a fully-qualified ref, like "refs/heads/foo". If it > exists, we obviously want to use that. But if it doesn't, then it > could be refs/something-else/refs/heads/foo. That's unlikely, but it > _does_ work now, and optimizing the advertisement would break it. > > So it seems like left-anchoring the refspecs can never be fully correct. > We can communicate "master" to the server, who can then look at every > ref it would advertise and ask "could this be called master"? But it > will be setting in stone the set of "could this be" patterns. Granted, > those haven't changed much over the history of git, but it seems awfully > fragile. The first thought that comes to mind is, if left anchoring does not work, let's support both left and right anchoring. I guess you considered and discarded this. If prefix matching does not work, and assuming "some-prefix" sent by client to be in fact "**/some-prefix" pattern at server side will set the "could this be" in stone, how about use wildmatch? It's flexible enough and we have full control over the pattern matching engine so C Git <-> C Git should be good regardless of platforms. I understand that wildmatch is still complicated enough that a re-implementation can easily divert in behavior. But a pattern with only '*', '/**', '/**/' and '**/' wildcards (in other words, no [] or ?) could make the engine a lot simpler and still fit our needs (and give some room for client-optimization). > In an ideal world the client and server would negotiate to come to some > agreement on the patterns being used. But as we are bolting this onto > the existing protocol, I was really trying to do it without introducing > an extra capabilities phase or extra round-trips. I.e., something like > David Turner's "stick the refspec in the HTTP query parameters" trick, > but working everywhere[1]. -- Duy