Re: Git Scaling: What factors most affect Git performance for a large repo?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
> Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's:
>
>  * Around 500k commits
>  * Around 100k tags
>  * Around 5k branches
>  * Around 500 commits/day, almost entirely to the same branch
>  * 1.5 GB .git checkout.
>  * Mostly text source, but some binaries (we're trying to cut down[1] on those)

Would be nice if you could make an anonymized version of this repo
public. Working on a "real" large repo is better than an artificial
one.

> But actually most of "git fetch" is spent in the reachability check
> subsequently done by "git-rev-list" which takes several seconds. I

I wonder if reachability bitmap could help here..

> haven't looked into it but there's got to be room for optimization
> there, surely it only has to do reachability checks for new refs, or
> could run in some "I trust this remote not to send me corrupt data"
> completely mode (which would make sense within a company where you can
> trust your main Git box).

No, it's not just about trusting the server side, it's about catching
data corruption on the wire as well. We have a trick to avoid
reachability check in clone case, which is much more expensive than a
fetch. Maybe we could do something further to help the fetch case _if_
reachability bitmaps don't help.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]