Re: Git performance results on a large repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 

On Feb 3, 2012, at 9:56 AM, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Feb 3, 2012 at 15:20, Joshua Redstone <joshua.redstone@xxxxxx> wrote:
> 
>> We (Facebook) have been investigating source control systems to meet our
>> growing needs.  We already use git fairly widely, but have noticed it
>> getting slower as we grow, and we want to make sure we have a good story
>> going forward.  We're debating how to proceed and would like to solicit
>> people's thoughts.
> 
> Where I work we also have a relatively large Git repository. Around
> 30k files, a couple of hundred thousand commits, clone size around
> half a GB.
> 
> You haven't supplied background info on this but it really seems to me
> like your testcase is converting something like a humongous Perforce
> repository directly to Git.
> 
> While you /can/ do this it's not a good idea, you should split up
> repositories at the boundaries code or data doesn't directly cross
> over, e.g. there's no reason why you need HipHop PHP in the same
> repository as Cassandra or the Facebook chat system, is there?
> 
> While Git could better with large repositories (in particular applying
> commits in interactive rebase seems to be to slow down on bigger
> repositories) there's only so much you can do about stat-ing 1.3
> million files.
> 
> A structure that would make more sense would be to split up that giant
> repository into a lot of other repositories, most of them probably
> have no direct dependencies on other components, but even those that
> do can sometimes just use some other repository as a submodule.
> 
> Even if you have the requirement that you'd like to roll out
> *everything* at a certain point in time you can still solve that with
> a super-repository that has all the other ones as submodules, and
> creates a tag for every rollout or something like that.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



I concur. I'm working in the company with many years of development history with several huge CVS repos and we are slowly but surely migrating the codebase from CVS to Git. 
Split the things up. This will allow you to reorganize things better and there is IMHO no downsides. 
As for rollout - i think this job should be given to build/release system that will have an ability to gather necessary code from different repos and tag it properly.

just my 2 cents

Thanks,
Eugene

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]