Re: Git benchmarks at OpenOffice.org wiki

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jan Holesovsky wrote:
> On Tuesday 01 May 2007 23:46, Jakub Narebski wrote:
> 
>> What I am concerned about is some of git benchmark results at Git page
>> on OpenOffice.org wiki:
>>   http://wiki.services.openoffice.org/wiki/Git#Comparison

>> The problem is with 'Size of checkout': to start working in repository
>> one needs 1.4G (sources) and 98M (third party) for CVS checkout (it is
>> 1.5G for sources for Subversion checkout). Ordinary for distributed SCM
>> you would need size of repository + size of sources (working area),
>> which is 2.8G for sources and 688M for third party stuff files you can
>> hack on + the history]. This makes some prefer to go centralized SCM
>> route, i.e. Subversion as replacement for CVS (+ CWS, ChildWorkSpace).
> 
> Considering the size OOo needs for build (>8G without languages),
> the ~1.4G overhead for history is very well bearable.  I am surprised about
> the 100M overhead for SVN as well - from my experience it is usually about
> the size of the project itself; but maybe they improved something in SVN
> in the meantime.

I think the supposition that SVN uses hardlinks for pristine copy
of sources (HEAD version) seems probable; then there it is 100M overhead
plus size of changed files, and of course this tricks works only on
filesystems which support hardlinks, and assumes either hardlinks being
COW-links (copy-on-write) or editor behaving.
 
>> What might help here is splitting repository into current (e.g. from
>> OOo 2.0) and historical part,
> 
> No, I don't want this ;-)

I forgot to add there is possible to graft historical repository to the
current work repository, resulting in full history available. For example
Linux kernel repository has backported from BK historical repository, and
there is grafts file which connect those two repositories.

>> and / or using shallow clone. 

git-clone(1):

--depth <depth>::
        Create a 'shallow' clone with a history truncated to the
        specified number of revs.  A shallow repository has
        number of limitations (you cannot clone or fetch from
        it, nor push from nor into it), but is adequate if you
        want to only look at near the tip of a large project
        with a long history, and would want to send in a fixes
        as patches.

It is possible that those limitations will be lifted in the future
(if possible), so there is alternate possibility to reduce needed
disk space for git checkout. But certainly this is not for everybody.

>> Implementing  
>> partial checkouts, i.e. checking out only part of working area (...)

The problem with implementing this feature (you can do partial checkout
using low level commands, but this feature is not implemented [yet?]
per se) is with doing merge on part which is not checked out. Might
not be a problem for OOo; but this might be also not needed for OOo.
Sometimes submodules are better, sometimes partial checkout is the
only way: see below.

>> Splitting repository into submodules, and submodule 
>> support -- it depends on organization of OOo sources, would certainly
>> help for third party stuff repository.
> 
> We should better split the OOo sources; it's a process that already started
> [UNO runtime environment vs. OOo without URE], and I proposed some more
> changes already.

In my opinion each submodule should be able to compile and test by
itself. You can go X.Org route with splitting sources into modules...
or you can make use of the new submodules support (currently plumbing
level, i.e. low level commands), aka. gitlinks.

The submodules support makes it possible to split sources into
independent modules (parts), which can be developed independently,
and which you can download (clone, fetch) or not, while making it
possible to bind it all together into one superproject.

See (somewhat not up to date) http://git.or.cz/gitwiki/SubprojectSupport
page on git wiki.

>> What I'm really concerned about is branch switch and merging branches,
>> when one of the branches is an old one (e.g. unxsplash branch), which
>> takes 3min (!) according to the benchmark. 13-25sec for commit is also
>> bit long, but BRANCH SWITCHING which takes 3 MINUTES!? There is no
>> comparison benchmark for CVS or Subversion, though...

By the way, the time to switch branch should be proportional to number
of changed files, which you can get with "git diff --summary unxsplash
HEAD". Or to be more realistic to checkout some old version
(some old tag), as usually branches which got merged in are deleted
(or even never got published). For example when bisecting some bug:
Subversion doesn't have bisect, does it?

I wonder if running "git pack-refs" would help this benchmark...

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]