I have re-added git mailing list to Cc. On Fri, 31 July 2009, Arnaud Bailly wrote: > Jakub Narebski <jnareb@xxxxxxxxx> writes: >> >>> 2. have a very large codebase >> >> How large codebase? What it means "large codebase"? Large number of >> files, or files large in size (usually binary)? > > Mostly lot of source files, amounting to about 9 MLOCs of mixed > languages source. > >> Git can deal comfortably with codebase of the size of Linux kernel. >> Perl 5 core was converted from Perforce to Git. > > I guess we are in the same order of magnitude than kernel. Linux kernel, for development of which git was initially designed, is around 7.5M LoC of code (10M LoC with comments and blank lines)[1]. The performance for such large codebase has to be good (at least on Linux and other POSIX systems), as good performance was one of goal decisions of git. Perl 5 core, which version control history was converted from Perforce to Git in December 2008[1][2] by Sam Vilain (you might want to take a look how it was done; unfortunately it seems that Sam Vilain blog vanished from Internet), is 2.3M LoC of mixed language code (mainly Perl and C, with twice as much Perl), so it is smaller than yours codebase. [1] According to OSS software metric site Ohloh (http://www.ohloh.net) [2] http://news.perlfoundation.org/2008/12/perl_5_development_now_on_git.html [3] http://use.perl.org/article.pl?sid=08/12/22/0830205 >> But git is snapshot based, not changeset based, and treats project >> (repository) as whole, not as a combination of single file histories. >> This means that it would be unproductive to use 'everything in single >> repository' approach. If your codebase is of the size of whole KDE >> tree, or the whole GNOME tree, you would need to organize it and split >> it into smaller, reasonably sized repositories (you can urganize them >> back together in a superproject using submodules). > > That's my biggest concern. We are actually using a single tree repository > approach with lot of branches. What led me to Git at first was the ease > of branching and merging. I used branching and merging with Subversion > and its painful. So it looks like you wouldn't _need_ to split source tree into separate smaller repositories for performance reasons. Still it might be good idea to put separate (sub)projects into separate repositories. But I guess you can do that even at later time (although it would be best to do this upfront). Branching and merging in Git is very easy (with Subversion 1.5 merging is supposedly to get easier). Git itself uses 'topic branches' workflow, where each feature (each series of patches) gets its own branch, and branches are cherry-picked to be merged (or to be dropped, or replaced by newer version of series). >> There is GitStat project: http://mirror.celinuxforum.org/gitstat/ Which follows Linux kernel. There are also some GitStat deployments tracking other code. >> There was also Git-Statistics project at Google Summer of Code 2008 >> which repository can be found at http://repo.or.cz/w/git-stats.gitSee http://git.or.cz/gitwiki/SoC2008Projects >> > > Great. And there are tools such as Ohloh, which do software metric, and some of code is available. GitHub hosting site also offers some software metric / statistic tools in its web interface; I don't know about other sites such as Gitorious or InDefero. Gitweb currently doesn't offer any statistics. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html