Re: many files, simple history

Jeff King <peff@xxxxxxxx> · Fri, 14 May 2010 00:05:59 -0400

On Thu, May 13, 2010 at 10:57:22PM -0400, Ali Tofigh wrote:

> short version: will git handle large number of files efficiently if
> the history is simple and linear, i.e., without merges?

Short answer: large number of files, yes, large files, not really. The
shape of history is largely irrelevant.

Longer answer:

Git separates the conceptual structure of history (the digraph of
commits, and the pointers of commits to trees to blobs) from the actual
storage of objects representing that history. Problems with large files
are usually storage issues. Copying them around in packfiles is
expensive, storing an extra copy in the repo is expensive, trying deltas
and diffs is expensive. None of those things has to do with the shape of
your history. So I would expect git to handle such a load with a linear
history about as well as a complex history with merges.

For large numbers of files, git generally does a good job, especially if
those files are distributed throughout a directory hierarchy. But keep
in mind that the git repo will store another copy of every file. They
will be delta-compressed between versions, and zlib compressed overall,
but you may potentially be doubling the amount of disk space required if
you have a lot of uncompressible binary files.

For large files, git expects to be able to pull each file into memory.
Sometimes two versions if you are doing a diff. And it will copy those
files around when repacking (which you will want to do for the sake of
the smaller files). So files on the order of a few megabytes are not a
problem. If you have files in the hundreds of megabytes or gigabytes,
expect some operations to be slow (like repacking).

Really, I would start by just "git add"-ing your whole filesystem, doing
a "git repack -ad", and seeing how long it takes, and what the resulting
size is.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html