Konstantin Ryabitsev <konstantin@xxxxxxxxxxxxxxxxxxx> wrote: > On Thu, May 14, 2020 at 02:23:44PM -0700, Junio C Hamano wrote: > > > I think something like git-caching-proxy would be a neat project, > > > because it would significantly improve mirroring for CI deployments > > > without requiring that each individual job implements clone.bundle > > > prefetching. > > > > What are we improving with such a proxy, though? > > > > Not bandwidth to the client, apparently. > > Well, if it sits in front of the CI subnet, then it *does* save > bandwidth. Agreed. > Here's an example with the exact situation we have: > > - the Gerrit server is on the US West Coast > - the CI builder is on the East Coast > - each CI job does a full transfer of the multi-MB repo across the > continent, even when cloning shallow > > We solve this by having a local mirror of the repository, but this > requires active mirroring to be pre-setup. A caching proxy that could: > > - receive a request for a repository > - stream the response back to the client > - cache objects locally > - use local cache to construct future requests, so only missing objects > are fetched from the remote repo regardless of the haves on the actual > client... An off-the-shelf HTTP caching proxy (e.g. polipo, Squid) could do a good enough job with dumb HTTP clones (via GIT_SMART_HTTP=0 env). With well-packed repos, the dumb HTTP transfer cost shouldn't be too high (and git 2.10+ got way faster on the client side with poorly-packed repos, thanks to the Linux kernel-derived list.h). The occasional full repack on the source git server will invalidate caches and result in a giant download; but it's better than no caching at all and doing giant cross-country transfers all day long. That said, I'm not sure if any client-side caching proxies can MITM HTTPS and save bandwidth with HTTPS everywhere, nowadays. I seem to recall polipo being abandoned because of HTTPS. Maybe there's a caching HTTPS MITM proxy out there...