For a repo like git itself, the assertions regarding the way git currently builds its data (in fact, including the `checkout` portion) does compete directly with the "cached result" methodology! Holy shit guys, I'm impressed as hell. tl;dr: The way I read the raw numbers, `git` ends up being as-fast-as (or faster) than a "cache" of the .git folder. Without doing further research, I'm inclined to agree with the previously mentioned bitmap method already being effectively as efficient as (more efficient than!?) a cache. Methodology/Reasoning: virtualized: verified zero network chatter on eth0 before and after each test. tcpflow: to gather the bits for the entire transaction... from just before the execution of `git clone` was started, and closing the listener just after execution ended. (not worrying about protocols/overhead) tar: to compare the size of the repository on disk with the tcpflow results. (not worrying about compensating for headers/metadata/overhead) gzip: to theoretically, I haven't checked anything, compensate for seemingly arbitrary size differences when downloading over HTTPS. time: (really) rough measure of execution time. Commands used to generate files: *.tcpflow: `sudo tcpflow -p -c -i eth0 > $filename.tcpflow` *.tar: `tar cf $filename.tar .git` *.gz: `gzip -9 $filename.tar` Results: 75M kernelorg.tar 72M kernelorg.tar.gz 69M kernelorg_git.tcpflow 69M kernelorg_https.tcpflow 145M github.tar 143M github.tar.gz 143M github_git.tcpflow 142M github_https.tcpflow Other Tests (sanity checks): Cloned a gitea mirror of kernel.org's git: 69M gitea_git.tcpflow 69M gitea_https.tcpflow Cloned a bitbucket mirror of kernel.org's git: 69M bitbucket_git.tcpflow 69M bitbucket_https.tcpflow $ time git clone git://git.kernel.org/pub/scm/git/git.git Cloning into 'git'... remote: Enumerating objects: 15475, done. remote: Counting objects: 100% (15475/15475), done. remote: Compressing objects: 100% (861/861), done. remote: Total 287977 (delta 14910), reused 14907 (delta 14610), pack-reused 272502 Receiving objects: 100% (287977/287977), 66.09 MiB | 4.87 MiB/s, done. Resolving deltas: 100% (217420/217420), done. real 0m20.000s user 0m15.414s sys 0m1.606s $ time wget https://calebgray.com/public/kernelorg.tar.gz --2020-05-25 06:11:29-- https://calebgray.com/public/kernelorg.tar.gz Resolving calebgray.com (calebgray.com)... 192.3.203.78 Connecting to calebgray.com (calebgray.com)|192.3.203.78|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 74593708 (71M) [application/octet-stream] Saving to: ‘kernelorg.tar.gz’ kernelorg.tar.gz 100%[========================================================================================>] 71.14M 4.81MB/s in 19s 2020-05-25 06:11:48 (3.79 MB/s) - ‘kernelorg.tar.gz’ saved [74593708/74593708] real 0m19.420s user 0m0.030s sys 0m0.280s Thanks everyone for your input and time! I love git, you guys do great work! P.S. I ran a few other benchmarks outside of these, and the timing always worked out to be more/less the same between the reported transfer rate (as told by my router, as well) and the "real" time it took to download (for both `git` and `wget`). P.P.S. I haven't investigated the reason for the github repo being nearly twice the size as the kernel.org hosted copy. That one stands out as potentially part of the proxy discussion, or there's actually a difference in the repo's data. Curiosity will likely get the best of me eventually. On Mon, May 18, 2020 at 9:40 PM Konstantin Tokarev <annulen@xxxxxxxxx> wrote: > > > > 18.05.2020, 01:12, "Konstantin Ryabitsev" <konstantin@xxxxxxxxxxxxxxxxxxx>: > > On Fri, May 15, 2020 at 09:42:57PM +0000, Eric Wong wrote: > >> That said, I'm not sure if any client-side caching proxies can > >> MITM HTTPS and save bandwidth with HTTPS everywhere, nowadays. > >> I seem to recall polipo being abandoned because of HTTPS. > >> Maybe there's a caching HTTPS MITM proxy out there... > > > > Right, this can't operate as a transparent proxy. > > AFAIK, Squid can do MITM, caching and operate transparently. > In the past it was done via ssl_bump directive, but seems like syntax changed a bit > in modern versions.