To Clarify: I'm talking about a server-side only cache which behaves much like a `tar` file: it is a flat version of exactly(*) what ends up on the client's storage. When a client runs `git --clone` and there's a valid cache on the other end, that's all that gets streamed. Konstantin's point that a repo like Linux is bound to see little/no benefit (in fact, it'll just constantly invalidate/rewrite the ~1gb cache) is reasonable. This feature definitely targets the "niche" audience of repos with less-frequent-pushes-to-master-than-clones. Bryan is exactly on the right track for what I'm referring to: the CDN approach did come to mind (and is superior in nearly every way). Junio nailed it: I'm not hoping for anything revolutionary here, just hoping to reduce the redundant steps in clone down to a single (presumably faster) step. If the community agrees that there's little/no benefit to the limitations of having a "cache for master and that's all," I'm also more than capable of designing a more useful/complex graph/reduce based solution which could dynamically bundle the most statistically relevant data for whatever context the code is working in, though-- I can't commit to any sort of deadline for that sort of a contribution. On Thu, May 14, 2020 at 2:05 PM Theodore Y. Ts'o <tytso@xxxxxxx> wrote: > > On Thu, May 14, 2020 at 04:33:26PM -0400, Konstantin Ryabitsev wrote: > > > Assuming my idea doesn't contradict other best practices or standards > > > already in place, I'd like to transform the typical `git clone` flow > > > from: > > > > > > Cloning into 'linux'... > > > remote: Enumerating objects: 4154, done. > > > remote: Counting objects: 100% (4154/4154), done. > > > remote: Compressing objects: 100% (2535/2535), done. > > > remote: Total 7344127 (delta 2564), reused 2167 (delta 1612), > > > pack-reused 7339973 > > > Receiving objects: 100% (7344127/7344127), 1.22 GiB | 8.51 MiB/s, done. > > > Resolving deltas: 100% (6180880/6180880), done. > > > > > > To subsequent clones (until cache invalidated) using the "flattened > > > cache" version (presumably built while fulfilling the first clone > > > request above): > > > > > > Cloning into 'linux'... > > > Receiving cache: 100% (7344127/7344127), 1.22 GiB | 8.51 MiB/s, done. > > > > I don't think it's a common workflow for someone to repeatedly clone > > linux.git. Automated processes like CI would be doing it, but they tend > > to blow away the local disk between jobs, so they are unlikely to > > benefit from any native git local cache for something like this (in > > fact, we recommend that people use clone.bundle files for their CI > > needs, as described here: > > https://www.kernel.org/best-way-to-do-linux-clones-for-your-ci.html). > > If the goal is a git local cache, we have this today. I'm not sure > this is what Caleb was asking for, though: > > git clone --bare https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git base > git clone --reference base https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git ext4 > > - Ted