On Wed, Nov 6, 2024 at 8:52 PM Manoraj K <mkenchugonde@xxxxxxxxxxxxx> wrote: > > Bump > > On Mon, Oct 28, 2024 at 4:00 PM Manoraj K <mkenchugonde@xxxxxxxxxxxxx> wrote: > > > > Hi, > > > > We've conducted benchmarks comparing Git operations between a fully > > cloned and partially cloned repository (both using sparse-checkout). > > We'd like to understand the technical reasons behind the consistent > > performance gains we're seeing in the partial clone setup. > > > > Benchmark Results: > > > > Full Clone + Sparse-checkout: > > - .git size: 8.5G > > - Git index size: 20MB > > - Pack objects: 18,761,646 > > - Operations (mean ± std dev): > > * git status: 0.634s ± 0.004s > > * git commit: 2.677s ± 0.019s > > * git checkout branch: 0.615s ± 0.005s > > * git pull (no changes): 5.983s ± 0.391s > > > > Partial Clone + Sparse-checkout: > > - .git size: 2.0G > > - Git index size: 20MB > > - Pack objects: 13,560,436 > > - Operations (mean ± std dev): > > * git status: 0.575s ± 0.012s (9.3% faster) > > * git commit: 2.164s ± 0.032s (19.2% faster) > > * git checkout branch: 0.724s ± 0.154s > > * git pull (no changes): 1.866s ± 0.018s (68.8% faster) > > > > Key Questions: > > 1. What are the technical factors causing these performance > > improvements in the partial clone setup? > > 2. To be able to get these benefits, is there a way to convert our > > existing fully cloned repository to behave like a partial clone > > without re-cloning from scratch? > > > > Appreciate any insights here. > > > > Best regards, > > Manoraj K Taking some wild guesses: `git pull` will both fetch updates for _all_ branches, as well as merge (or rebase) the updates for the current branch. Your "no changes" probably means there's no merge/rebase needed, but that doesn't mean there was nothing to fetch. A partial clone isn't going to download all the blobs, so it has much less to download and is thus significantly faster. `git checkout branch` would likely be slower in a partial clone because sometimes objects are missing and need to be downloaded. And indeed, it shows as being a little slower for you. `git status` is harder to guess at. The only guess I can come up with for this case is that fewer objects means faster lookup (I'm not familiar with the packfile code, but think object lookups use a bisect to find the objects in question, and fewer objects to bisect would make things faster if so); not sure if this could account for a 9% difference, though. Maybe someone who understands packfiles, object lookup, and promisor remotes has a better idea here? I'm a bit surprised by the `git commit` case; how can it take so long on your repo (2-3s)? Do you have commit hooks in place? If so, what are they doing? (And if you do, I suspect whatever they are doing is responsible for the differences in timings between the partial clone and the full clone, so you'd need to dig into them.)