Re: Advice on choosing git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 11, 2010 at 11:31:34PM -0700, Noah Silverman wrote:
> 
> 1) Size.  THIS IS MY MAIN CONCERN - If I want to sync my home, office,
> and server Document directories.  From what I have read, I will
> effectively have multiple copies of each item on my hard drive, thus
> eating up a lot of space (One of the "working file"and several in the
> .git directory.)

Usually, Git is more efficient in disk space than other DVCS, because
it uses packages to store files. In each package contains deltified
and then gzip data, and this deltification is done not only relatively
to direct ancestor but potentially any suitable candidate (there is some
heuristic to find best). But when you add a new file to the repository
then it is stored just gzip compressed inside .gzip/objects. Such files
are often referred as "loose" in Git documentation. When you have a lot
of loose objects then the garbage collector is activated and packs them
together. Obviously, you can run "git gc" that manually, or to configure
the condition what means too many loose objects.

Even those files that are stored as loose objects is never transfered
separately over network. When you pull or push, all required objects are
packed together in a single package, and this package is sent to the
other side. So, on the other side they will never stored as separate
files. But each push/pull can create a new package, if you have too many
small packages, git-gc will combine them into a single package.

However, if you have huge multi-media files, I am not sure how Git is
good at handling them. There were some improvements to Git recently,
and there is a clone of git that specifically focuses on this problem:
http://caca.zoy.org/wiki/git-bigfile
but I don't know much about it.

> several full versions of it on my machine.  This could be a problem for
> a directory with 100GB or more, especially on a laptop with limited hard
> drive space.  I know Subversion is a dirty word around here, but it
> seemed to only annotate and send the changes

Actually, Subversion is very inefficient in space usage (at least,
when I used it last time). I had a repository where subversion checkout
took much more space than git working tree and the whole repository with
all history combine! Obviously, a centralized VCS do not have to store
the whole history on each client, which saves space, but having the
whole history with you is very handy, and also it avoids the situation
where you have a single point of failure.

BTW, git allows to do a shallow clone to save space by not storying the
whole history (only the specified number of revisions), but I have never
used this feature, and it has some limitations.

> 
> 2) Sub-directory selection.  On my laptop, I only want a few
> sub-directories to be synced up.  I don't need my whole document tree,
> but just a few directories of things I work on.

Synchronization works on what you committed in your repository. At
this level, directories are completely irrelevant. Probably, you
want to have a separate repository for each sub-directory that you
want to synchronize separately, and then you can bundle them together
using git-submodules mechanism or trivial shell script that will
synchronize all of them.

In fact, the basic concept of Git is to treat a single repository
as whole. So, if you have some pieces that are irrelevant, it is
better to store them in separate repositories. It will improve
speed and possible disk usage, because deltifying will have easy
time to find related files, so compression will be better.

> 
> Bazaar also looks like a possible option, but I'm not sure it handles
> drive usage better.  Their website has a lengthy manifesto about how
> they're better than Git, but I don't have enough experience with either
> to make an informed decision.

Well, this manifesto sounds like written by a marketing guy, and it
compares Bazaar to rather old version of Git... So I am not going to
comment on it.

In fact, any meaningful comparison has to consider your workflow. Git
targets fully distributed workflow, which may even have hierarchy of
repositories, while Bazaar focus around more centralized solution and
close to what you have with Subversion. So, people who got used to a
centralized VCS may find Bazaar easier at the beginning, but IMHO,
Git is more flexible and when you learn basic principles everything
feels very natural.

In any case, your main concern was the size of the repository, and
even this marketing piece from Bazaar admits that Git is better at
saving disk space.

Here you can see some comparison of a repository size for Git,
Mercurial, Bazaar:
http://vcscompare.blogspot.com/2008/06/git-mercurial-bazaar-repository-size.html



Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]