Re: From P4 to Git

Jakub Narebski <jnareb@xxxxxxxxx> · Fri, 31 Jul 2009 11:22:42 +0200

I have re-added git mailing list to Cc.

On Fri, 31 July 2009, Arnaud Bailly wrote:
> Jakub Narebski <jnareb@xxxxxxxxx> writes:
>>
>>>  2. have a very large codebase
>>
>> How large codebase?  What it means "large codebase"?  Large number of
>> files, or files large in size (usually binary)?
> 
> Mostly lot of source files, amounting to about 9 MLOCs of mixed
> languages source.
> 
>> Git can deal comfortably with codebase of the size of Linux kernel.
>> Perl 5 core was converted from Perforce to Git.
> 
> I guess we are in the same order of magnitude than kernel.

Linux kernel, for development of which git was initially designed,
is around 7.5M LoC of code (10M LoC with comments and blank lines)[1].
The performance for such large codebase has to be good (at least on
Linux and other POSIX systems), as good performance was one of goal
decisions of git.

Perl 5 core, which version control history was converted from Perforce
to Git in December 2008[1][2] by Sam Vilain (you might want to take
a look how it was done; unfortunately it seems that Sam Vilain blog
vanished from Internet), is 2.3M LoC of mixed language code (mainly
Perl and C, with twice as much Perl), so it is smaller than yours
codebase.

[1] According to OSS software metric site Ohloh (http://www.ohloh.net)
[2] http://news.perlfoundation.org/2008/12/perl_5_development_now_on_git.html
[3] http://use.perl.org/article.pl?sid=08/12/22/0830205

>> But git is snapshot based, not changeset based, and treats project
>> (repository) as whole, not as a combination of single file histories.
>> This means that it would be unproductive to use 'everything in single
>> repository' approach.  If your codebase is of the size of whole KDE
>> tree, or the whole GNOME tree, you would need to organize it and split
>> it into smaller, reasonably sized repositories (you can urganize them
>> back together in a superproject using submodules).
> 
> That's my biggest concern. We are actually using a single tree repository
> approach with lot of branches. What led me to Git at first was the ease
> of branching and merging. I used branching and merging with Subversion
> and its painful.

So it looks like you wouldn't _need_ to split source tree into separate
smaller repositories for performance reasons.  Still it might be good
idea to put separate (sub)projects into separate repositories.  But
I guess you can do that even at later time (although it would be best
to do this upfront).

Branching and merging in Git is very easy (with Subversion 1.5 merging
is supposedly to get easier).  Git itself uses 'topic branches' workflow,
where each feature (each series of patches) gets its own branch, and
branches are cherry-picked to be merged (or to be dropped, or replaced
by newer version of series).

>> There is GitStat project: http://mirror.celinuxforum.org/gitstat/

Which follows Linux kernel.  There are also some GitStat deployments
tracking other code.

>> There was also Git-Statistics project at Google Summer of Code 2008
>> which repository can be found at http://repo.or.cz/w/git-stats.gitSee http://git.or.cz/gitwiki/SoC2008Projects
>>
> 
> Great.

And there are tools such as Ohloh, which do software metric, and some
of code is available.  GitHub hosting site also offers some software 
metric / statistic tools in its web interface; I don't know about other
sites such as Gitorious or InDefero.  Gitweb currently doesn't offer any
statistics.

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html