Re: Git is not scalable with too many refs/*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, September 26, 2011 09:15:29 am Martin Fick wrote:
> OK, I have found what I believe is another performance
> regression for large ref counts (~100K).
> 
> When I run git br on my repo which only has one branch,
> but has ~100K refs under ref/changes (a gerrit repo), it
> takes normally 3-6mins depending on whether my caches
> are fresh or not.  After bisecting some older changes, I
> noticed that this ref seems to be where things start to
> get slow: c774aab98ce6c5ef7aaacbef38da0a501eb671d4
> 
> 
> commit c774aab98ce6c5ef7aaacbef38da0a501eb671d4
> Author: Julian Phillips <julian@xxxxxxxxxxxxxxxxx>
> Date:   Tue Apr 17 02:42:50 2007 +0100
> 
>     refs.c: add a function to sort a ref list, rather
> then sorting on add
> 
>     Rather than sorting the refs list while building it,
> sort in one
>     go after it is built using a merge sort.  This has a
> large
>     performance boost with large numbers of refs.
> 
>     It shouldn't happen that we read duplicate entries
> into the same
>     list, but just in case sort_ref_list drops them if
> the SHA1s are
>     the same, or dies, as we have no way of knowing which
> one is the
>     correct one.
> 
>     Signed-off-by: Julian Phillips
> <julian@xxxxxxxxxxxxxxxxx>
>     Acked-by: Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Junio C
> Hamano <junkio@xxxxxxx>
> 
> 
> 
> which is a bit strange since that commit's purpose was to
> actually speed things up in the case of many refs.  Just
> to verify, I reverted the commit on 1.7.7.rc0.73 and
> sure enough, things speed up down to the 14-20s range
> depending on caching.
> 
> If this change does not actually speed things up, should
> it be reverted?  Or was there a bug in the change that
> makes it not do what it was supposed to do?


Ahh, I think I have some more clues.  So while this change 
does not speed things up for me normally, I found a case 
where it does!  I  set my .git/config to have

  [core]
        compression = 0

and ran git-gc on my repo.  Now, with a modern git with this 
optimization in it (1.7.6, 1.7.7.rc0...), 'git branch' is 
almost instantaneous (.05s)!  But, if I revert c774aa it 
takes > ~15s.  

So, it appears that this optimization is foiled by 
compression?  In the case when this optimization helps, it 
save about 15s, when it hurts (with compression), it seems 
to cost > 3mins.  I am not sure this optimization is worth 
it?  Would there be someway for it to adjust to the repo 
conditions?

 
Thanks,

-Martin

-- 
Employee of Qualcomm Innovation Center, Inc. which is a 
member of Code Aurora Forum
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]