Re: Git is not scalable with too many refs/*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 26 Sep 2011 15:52:04 -0600, Martin Fick wrote:
On Monday, September 26, 2011 03:39:33 pm Martin Fick wrote:
On Monday, September 26, 2011 02:28:53 pm Julian Phillips
wrote:
> >> Random thought.  What happens to the with
> >> compression case if you leave the commit in, but
> >> add a sleep(15) to the end of sort_refs_list?
> >
> > Why, what are you thinking?  Hmm, I am trying this on
> > the non gced repo and it doesn't seem to be
> > completing (no cpu usage)!  It appears that perhaps
> > it is being called many times (the sleeping would
> > explain no cpu usage)?!?  This could be a real
> > problem, this should only get called once right?
>
> I was just wondering if the time taken to get the refs
> was changing the interaction with something else.  Not
> very likely, but ...
>
> I added a print statement, and it was called four times
> when I had unpacked refs, and once with packed.  So,
> maybe you are hitting some nasty case with unpacked
> refs.  If you use a print statement instead of a sleep,
> how many times does sort_refs_lists get called in your
> unpacked case?  It may well also be worth calculating
> the time taken to do the sort.

In my case it was called 18785 times!  Any other tests I
should run?

Gerrit stores the changes in directories under refs/changes
named after the last 2 digits of the change.  Then under
each change it stores each patchset.  So it looks like this:
refs/changes/dd/change_num/ps_num

I noticed that:

 ls refs/changes/* | wc -l
 -> 18876

somewhat close, but not super close to 18785,  I am not sure
if that is a clue.  It's almost like each change is causing
a re-sort,

basically, it is ...

Back when I made that change, I failed to notice that get_ref_dir was recursive for subdirectories ... sorry ...

Hopefully this should speed things up. My test repo went from ~17m user time, to ~2.5s.
Packing still make things much faster of course.

diff --git a/refs.c b/refs.c
index a615043..212e7ec 100644
--- a/refs.c
+++ b/refs.c
@@ -319,7 +319,7 @@ static struct ref_list *get_ref_dir(const char *submodule, c
                free(ref);
                closedir(dir);
        }
-       return sort_ref_list(list);
+       return list;
 }

 struct warn_if_dangling_data {
@@ -361,11 +361,13 @@ static struct ref_list *get_loose_refs(const char *submodu
        if (submodule) {
                free_ref_list(submodule_refs.loose);
submodule_refs.loose = get_ref_dir(submodule, "refs", NULL); + submodule_refs.loose = sort_refs_list(submodule_refs.loose);
                return submodule_refs.loose;
        }

        if (!cached_refs.did_loose) {
                cached_refs.loose = get_ref_dir(NULL, "refs", NULL);
+               cached_refs.loose = sort_refs_list(cached_refs.loose);
                cached_refs.did_loose = 1;
        }
        return cached_refs.loose;





-Martin

--
Julian
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]