On Thu, May 02 2019, Eric Wong wrote: > Stefan Beller <sbeller@xxxxxxxxxx> wrote: >> IIRC, More than half the bandwidth of Googles git servers are used >> for ls-remote calls (i.e. polling a lot of repos, most of them did *not* >> change, by build bots which are really eager to try again after a minute). > > Thinking back at that statement; I think polling can be > optimized in git, at least. > > IIRC, your repos have lots of refs; right? > (which is why it's a bandwidth problem) > > Since info/refs is a static file (hopefully updated by a > post-update hook), the smart client can make an HTTP request > to check If-Modified-Since: to avoid the big response. > > The client would need to cache the mtime of the last requested > refs file; somewhere. > > IOW, do refs negotiation the "dumb" way; since it's no better > than the smart way, really. Keep doing object transfers the > smart way. > > During the initial clone, smart servers could probably > have a header informing clients that their info/refs > is up-to-date and clients can do dumb refs negotiation. Doing this with If-Modified-Since sounds like an easier drop-in replacement (just needs a client change), but I wonder if ETag isn't a better fit for this. I.e. we'd document some convention where the ETag is a hash of the refs the client expects to be advertised in some format, it then sends that to the server. That allows the same thing without anyone keeping more state than they keep now in their local ref store On the fancier side I think bloom filters are something that's been discussed (and I believe someone (Twitter?) had such an internal patch), i.e. the client sends a bloom filter of refs they have, and the server advertises things they don't know about yet (and due to how bloom filters work, some things they *do* know about already but tripped up the bloom filter...).