Re: From P4 to Git

Jakub Narebski <jnareb@xxxxxxxxx> · Tue, 28 Jul 2009 14:10:16 -0700 (PDT)

Arnaud Bailly <abailly@xxxxxxxxx> writes:

> I am investigating the use of Git in a setting where we:
>  1. branches a lot

Git handles large number of branches very well.  

The workflow of topic branches, where each new feature is developed on
separate branch, which can be individually picked to be merged (or
not) is used by git itself, for example.

>  2. have a very large codebase

How large codebase?  What it means "large codebase"?  Large number of
files, or files large in size (usually binary)?

Git can deal comfortably with codebase of the size of Linux kernel.
Perl 5 core was converted from Perforce to Git.

But git is snapshot based, not changeset based, and treats project
(repository) as whole, not as a combination of single file histories.
This means that it would be unproductive to use 'everything in single
repository' approach.  If your codebase is of the size of whole KDE
tree, or the whole GNOME tree, you would need to organize it and split
it into smaller, reasonably sized repositories (you can urganize them
back together in a superproject using submodules).

If ou can't do that, you would probably be better with other version
control system, like Subversion (IIRC both KDE and OpenOffice.org
chosen this free centralized version control system).

Because Git was created to version control source code, it might not
work well with large binary files, meaning that performance would
suffer.  

Partial checkouts (where you check out only part of repository) were
proposed, but are not implemented yet.  Neither did lazy clone /
remote alternates idea.  You can do a bit with undocumented `delta`
gitattribute, and with putting large binary blobs into separate
packfile, which is 'kept' (using *.keep file) against repacking, and
perhaps available on networked filesystem.

I think you can use refs/replace/ mechanizm (IIRC currently in 'pu',
(proposed updates) branch) to have two versions of repository: one
with binary blobs and one without.

> 
> Given Git is developed to handle these 2 issues, I suspect it would be a
> very good choice, but I need to gather some experiments feedback and
> hard figures on how Git performs (storage use, necessary
> bandwidth/resources, maintainance of repositories, cleanup & gc...).
> 
> For the experiment part, I am started working on it but would be
> interested in other people's experiences.

Check out Sam Vilain (?) reports from converting Perl 5 repository
from Perforce to Git.

> 
> For the figures part, I think I read somewhere there exists some
> statistics on Git usage for Linux kernel. Is this correct ? If true,
> where could I find them ? 

There is GitStat project: http://mirror.celinuxforum.org/gitstat/

There was also Git-Statistics project at Google Summer of Code 2008
which repository can be found at http://repo.or.cz/w/git-stats.git
See http://git.or.cz/gitwiki/SoC2008Projects

> 
> Thanks in advance for answering my (maybe pointless) questions and for
> producing such a nice piece of software.
> -- 
> Arnaud Bailly

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html