Re: Git is not scalable with too many refs/*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, September 26, 2011 02:28:53 pm Julian Phillips 
wrote:
> On Mon, 26 Sep 2011 14:01:38 -0600, Martin Fick wrote:
> -- snip --
> 
> > So, maybe you are correct, maybe my repo is the corner
> > case? Is a repo which needs to be gced considered a
> > corner case? Should git be able to detect that the
> > repo is so in desperate need of gcing?  Is it normal
> > for git to need to gc right after a clone and then
> > fetching ~100K refs?
> 
> Were you 100k refs packed before the gc?  If not, perhaps
> your refs are causing a lot of trouble for the merge
> sort?  They will be written out sorted to the
> packed-refs file, so the merge sort won't have to do any
> real work when loading them after that...

I am not sure how to determine that (?), but I think they 
were packed.  Under .git/objects/pack there were 2 large 
files, both close to 500MB.  Those 2 files constituted most 
of the space in the repo (I was wrong about the repo sizes, 
that included the working dir, so think about half the 
quoted sizes for all of .git).  So does that mean it is 
mostly packed?  Aside from the pack and idx files, there was 
nothing else under the objects dir.  After gcing, it is down 
to just one ~500MB pack file.


> > I am not sure what is right here, if this patch makes a
> > repo which needs gcing degrade 5 to 10 times worse
> > than the benefit of this patch, it still seems
> > questionable to me.
> 
> Well - it does this _for your repo_, that doesn't
> automatically mean that it does generally, or
> frequently.  

Oh, def agreed! I just didn't want to discount it so quickly 
as being a corner case.


> For instance, none of my normal repos that
> have a lot of refs are Gerrit ones, and I wouldn't be
> surprised if they benefitted from the merge sort
> (assuming that I am right that the merge sort is taking
> a long time on your gerrit refs).
> 
> Besides, you would be better off running gc, and thus
> getting the benefit too.

Agreed, which is why I was asking if git should have noticed 
my "degenerate" case and auto gced?  But hopefully, there is 
an actual bug here somewhere and we both will get to eat our 
cake. :)



> >> Random thought.  What happens to the with compression
> >> case if you leave the commit in, but add a sleep(15)
> >> to the end of sort_refs_list?
> > 
> > Why, what are you thinking?  Hmm, I am trying this on
> > the non gced repo and it doesn't seem to be completing
> > (no cpu usage)!  It appears that perhaps it is being
> > called many times (the sleeping would explain no cpu
> > usage)?!?  This could be a real problem, this should
> > only get called once right?
> 
> I was just wondering if the time taken to get the refs
> was changing the interaction with something else.  Not
> very likely, but ...
> 
> I added a print statement, and it was called four times
> when I had unpacked refs, and once with packed.  So,
> maybe you are hitting some nasty case with unpacked
> refs.  If you use a print statement instead of a sleep,
> how many times does sort_refs_lists get called in your
> unpacked case?  It may well also be worth calculating
> the time taken to do the sort.

In my case it was called 18785 times!  Any other tests I 
should run?

-Martin

-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]