Re: [QUESTION] Performance comparison: full clone + sparse-checkout vs partial clone + sparse-checkout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Elijah,

Thanks for your response! Sorry for not responding sooner.

-- `git pull` will both fetch updates for _all_ branches, as well as
merge (or rebase) the updates for the current branch.

The `git pull` here is actually `git pull origin master`. I guess it
will fetch objects and blobs for the master branch only, and in this
case, both partial clone pull and full clone pull should perform
equally.

-- I'm a bit surprised by the `git commit` case; how can it take so
long on your repo (2-3s)?

I run these with `--no-verify,` so hooks don't impact these benchmarks.

How does git understand that it's a partial clone repository during
the object lookup? How does it understand that the object needs to be
fetched instead of coming to understand that the object is not found
in error?


On Fri, Nov 8, 2024 at 10:54 PM Elijah Newren <newren@xxxxxxxxx> wrote:
>
> On Wed, Nov 6, 2024 at 8:52 PM Manoraj K <mkenchugonde@xxxxxxxxxxxxx> wrote:
> >
> > Bump
> >
> > On Mon, Oct 28, 2024 at 4:00 PM Manoraj K <mkenchugonde@xxxxxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > We've conducted benchmarks comparing Git operations between a fully
> > > cloned and partially cloned repository (both using sparse-checkout).
> > > We'd like to understand the technical reasons behind the consistent
> > > performance gains we're seeing in the partial clone setup.
> > >
> > > Benchmark Results:
> > >
> > > Full Clone + Sparse-checkout:
> > > - .git size: 8.5G
> > > - Git index size: 20MB
> > > - Pack objects: 18,761,646
> > > - Operations (mean ± std dev):
> > >   * git status: 0.634s ± 0.004s
> > >   * git commit: 2.677s ± 0.019s
> > >   * git checkout branch: 0.615s ± 0.005s
> > >   * git pull (no changes): 5.983s ± 0.391s
> > >
> > > Partial Clone + Sparse-checkout:
> > > - .git size: 2.0G
> > > - Git index size: 20MB
> > > - Pack objects: 13,560,436
> > > - Operations (mean ± std dev):
> > >   * git status: 0.575s ± 0.012s (9.3% faster)
> > >   * git commit: 2.164s ± 0.032s (19.2% faster)
> > >   * git checkout branch: 0.724s ± 0.154s
> > >   * git pull (no changes): 1.866s ± 0.018s (68.8% faster)
> > >
> > > Key Questions:
> > > 1. What are the technical factors causing these performance
> > > improvements in the partial clone setup?
> > > 2. To be able to get these benefits, is there a way to convert our
> > > existing fully cloned repository to behave like a partial clone
> > > without re-cloning from scratch?
> > >
> > > Appreciate any insights here.
> > >
> > > Best regards,
> > > Manoraj K
>
> Taking some wild guesses:
>
> `git pull` will both fetch updates for _all_ branches, as well as
> merge (or rebase) the updates for the current branch.  Your "no
> changes" probably means there's no merge/rebase needed, but that
> doesn't mean there was nothing to fetch.  A partial clone isn't going
> to download all the blobs, so it has much less to download and is thus
> significantly faster.
>
> `git checkout branch` would likely be slower in a partial clone
> because sometimes objects are missing and need to be downloaded.  And
> indeed, it shows as being a little slower for you.
>
> `git status` is harder to guess at.  The only guess I can come up with
> for this case is that fewer objects means faster lookup (I'm not
> familiar with the packfile code, but  think object lookups use a
> bisect to find the objects in question, and fewer objects to bisect
> would make things faster if so); not sure if this could account for a
> 9% difference, though.  Maybe someone who understands packfiles,
> object lookup, and promisor remotes has a better idea here?
>
> I'm a bit surprised by the `git commit` case; how can it take so long
> on your repo (2-3s)?  Do you have commit hooks in place?  If so, what
> are they doing?  (And if you do, I suspect whatever they are doing is
> responsible for the differences in timings between the partial clone
> and the full clone, so you'd need to dig into them.)





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux