Re: Git is not scalable with too many refs/*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 26 Sep 2011 15:39:33 -0600, Martin Fick wrote:
On Monday, September 26, 2011 02:28:53 pm Julian Phillips
wrote:
On Mon, 26 Sep 2011 14:01:38 -0600, Martin Fick wrote:
-- snip --

> So, maybe you are correct, maybe my repo is the corner
> case? Is a repo which needs to be gced considered a
> corner case? Should git be able to detect that the
> repo is so in desperate need of gcing?  Is it normal
> for git to need to gc right after a clone and then
> fetching ~100K refs?

Were you 100k refs packed before the gc?  If not, perhaps
your refs are causing a lot of trouble for the merge
sort?  They will be written out sorted to the
packed-refs file, so the merge sort won't have to do any
real work when loading them after that...

I am not sure how to determine that (?), but I think they
were packed.  Under .git/objects/pack there were 2 large
files, both close to 500MB.  Those 2 files constituted most
of the space in the repo (I was wrong about the repo sizes,
that included the working dir, so think about half the
quoted sizes for all of .git).  So does that mean it is
mostly packed?  Aside from the pack and idx files, there was
nothing else under the objects dir.  After gcing, it is down
to just one ~500MB pack file.

If refs are listed under .git/refs/... they are unpacked, if they are listed in .git/packed-refs they are packed.
They can be in both if updated since the last pack.

> I am not sure what is right here, if this patch makes a
> repo which needs gcing degrade 5 to 10 times worse
> than the benefit of this patch, it still seems
> questionable to me.

Well - it does this _for your repo_, that doesn't
automatically mean that it does generally, or
frequently.

Oh, def agreed! I just didn't want to discount it so quickly
as being a corner case.


For instance, none of my normal repos that
have a lot of refs are Gerrit ones, and I wouldn't be
surprised if they benefitted from the merge sort
(assuming that I am right that the merge sort is taking
a long time on your gerrit refs).

Besides, you would be better off running gc, and thus
getting the benefit too.

Agreed, which is why I was asking if git should have noticed
my "degenerate" case and auto gced?  But hopefully, there is
an actual bug here somewhere and we both will get to eat our
cake. :)

I think automatic gc is currently only triggered by unpacked objects, not unpacked refs ... perhaps the auto-gc should cover refs too?

>> Random thought.  What happens to the with compression
>> case if you leave the commit in, but add a sleep(15)
>> to the end of sort_refs_list?
>
> Why, what are you thinking?  Hmm, I am trying this on
> the non gced repo and it doesn't seem to be completing
> (no cpu usage)!  It appears that perhaps it is being
> called many times (the sleeping would explain no cpu
> usage)?!?  This could be a real problem, this should
> only get called once right?

I was just wondering if the time taken to get the refs
was changing the interaction with something else.  Not
very likely, but ...

I added a print statement, and it was called four times
when I had unpacked refs, and once with packed.  So,
maybe you are hitting some nasty case with unpacked
refs.  If you use a print statement instead of a sleep,
how many times does sort_refs_lists get called in your
unpacked case?  It may well also be worth calculating
the time taken to do the sort.

In my case it was called 18785 times!  Any other tests I
should run?

That's a lot of sorts. I really can't see why there would need to be more than one ...

I've created a new test repo, using a more complicated method to construct the 100k refs, and it took ~40m to run "git branch" instead of the 1.2s for the previous repo. So, I think the ref naming pattern used by Gerrit is definitely triggering something odd. However, progress is a bit slow - now that it takes over 1/2 an hour to try things out ...

--
Julian
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]