Re: Git is not scalable with too many refs/*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 28 September 2011 18:59:09 Martin Fick wrote: 
> Julian Phillips <julian@xxxxxxxxxxxxxxxxx> wrote: 
> > On Wed, 28 Sep 2011 16:10:48 -0600, Martin Fick wrote: 
> >> So with that bug fixed, the thing taking the most time 
> >> now for a git checkout with ~100K refs seems to be the 
> >> orphan check as Thomas predicted. The strange part with 
> >> this, is that the orphan check seems to take only about 
> >> ~20s in the repo where the refs aren't packed. However, 
> >> in the repo where they are packed, this check takes at 
> >> least 5min! This seems a bit unusual, doesn't it? Is 
> >> the filesystem that much better at indexing refs than 
> >> git's pack mechanism? Seems unlikely, the unpacked refs 
> >> take 312M in the FS, the packed ones only take about 
> >> 4.3M. I suspect their is something else unexpected 
> >> going on here in the packed ref case. 
> >> 
> >> Any thoughts? I will dig deeper... 
> > 
> > I think the problem is that resolve_ref() walks a linked 
> > list of searching for the packed ref. Does this mean that 
> > packed refs are not indexed at all? 
> > Are you sure that it is walking the linked list that is the problem?

It sure seems like it.

> I've created a test repo with ~100k refs/changes/... style refs, and 
> ~40000 refs/heads/... style refs, and checkout can walk the list of 
> ~140k refs seven times in 85ms user time including doing whatever other 
> processing is needed for checkout. The real time is only 114ms - but 
> then my test repo has no real data in.

If I understand what you are saying, it sounds like you do not have a very good test case. The amount of time it takes for checkout depends on how long it takes to find a ref with the sha1 that you are on. If that sha1 is so early in the list of refs that it only took you 7 traversals to find it, then that is not a very good testcase. I think that you should probably try making an orphaned ref (checkout a detached head, commit to it), that is probably the worst testcase since it should then have to search all 140K refs to eventually give up.

Again, if I understand what you are saying, if it took 85ms for 7 traversals, then it takes approximately 10ms per traversal, that's only 100/s! If you have to traverse it 140K times, that should work out to 1400s ~ 23mins.

> If resolve_ref() walking the linked list of refs was the problem, then > I would expect my test repo to show the same problem. It doesn't, a pre 
> ref-packing checkout took minutes (~0.5s user time), whereas a 
> ref-packed checkout takes ~0.1s. So, I would suggest that the problem > lies elsewhere. 
> 
> Have you tried running a checkout whilst profiling?

No, to be honest, I am not familiar with any profilling tools.

-Martin

Employee of Qualcomm Innovation Center,Inc. which is a member of Code Aurora Forum
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]