Hi Elijah, Thanks for your response! Sorry for not responding sooner. -- `git pull` will both fetch updates for _all_ branches, as well as merge (or rebase) the updates for the current branch. The `git pull` here is actually `git pull origin master`. I guess it will fetch objects and blobs for the master branch only, and in this case, both partial clone pull and full clone pull should perform equally. -- I'm a bit surprised by the `git commit` case; how can it take so long on your repo (2-3s)? I run these with `--no-verify,` so hooks don't impact these benchmarks. How does git understand that it's a partial clone repository during the object lookup? How does it understand that the object needs to be fetched instead of coming to understand that the object is not found in error? On Fri, Nov 8, 2024 at 10:54 PM Elijah Newren <newren@xxxxxxxxx> wrote: > > On Wed, Nov 6, 2024 at 8:52 PM Manoraj K <mkenchugonde@xxxxxxxxxxxxx> wrote: > > > > Bump > > > > On Mon, Oct 28, 2024 at 4:00 PM Manoraj K <mkenchugonde@xxxxxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > We've conducted benchmarks comparing Git operations between a fully > > > cloned and partially cloned repository (both using sparse-checkout). > > > We'd like to understand the technical reasons behind the consistent > > > performance gains we're seeing in the partial clone setup. > > > > > > Benchmark Results: > > > > > > Full Clone + Sparse-checkout: > > > - .git size: 8.5G > > > - Git index size: 20MB > > > - Pack objects: 18,761,646 > > > - Operations (mean ± std dev): > > > * git status: 0.634s ± 0.004s > > > * git commit: 2.677s ± 0.019s > > > * git checkout branch: 0.615s ± 0.005s > > > * git pull (no changes): 5.983s ± 0.391s > > > > > > Partial Clone + Sparse-checkout: > > > - .git size: 2.0G > > > - Git index size: 20MB > > > - Pack objects: 13,560,436 > > > - Operations (mean ± std dev): > > > * git status: 0.575s ± 0.012s (9.3% faster) > > > * git commit: 2.164s ± 0.032s (19.2% faster) > > > * git checkout branch: 0.724s ± 0.154s > > > * git pull (no changes): 1.866s ± 0.018s (68.8% faster) > > > > > > Key Questions: > > > 1. What are the technical factors causing these performance > > > improvements in the partial clone setup? > > > 2. To be able to get these benefits, is there a way to convert our > > > existing fully cloned repository to behave like a partial clone > > > without re-cloning from scratch? > > > > > > Appreciate any insights here. > > > > > > Best regards, > > > Manoraj K > > Taking some wild guesses: > > `git pull` will both fetch updates for _all_ branches, as well as > merge (or rebase) the updates for the current branch. Your "no > changes" probably means there's no merge/rebase needed, but that > doesn't mean there was nothing to fetch. A partial clone isn't going > to download all the blobs, so it has much less to download and is thus > significantly faster. > > `git checkout branch` would likely be slower in a partial clone > because sometimes objects are missing and need to be downloaded. And > indeed, it shows as being a little slower for you. > > `git status` is harder to guess at. The only guess I can come up with > for this case is that fewer objects means faster lookup (I'm not > familiar with the packfile code, but think object lookups use a > bisect to find the objects in question, and fewer objects to bisect > would make things faster if so); not sure if this could account for a > 9% difference, though. Maybe someone who understands packfiles, > object lookup, and promisor remotes has a better idea here? > > I'm a bit surprised by the `git commit` case; how can it take so long > on your repo (2-3s)? Do you have commit hooks in place? If so, what > are they doing? (And if you do, I suspect whatever they are doing is > responsible for the differences in timings between the partial clone > and the full clone, so you'd need to dig into them.)