On 3/30/2012 3:34 PM, Jeff King wrote:
On Fri, Mar 30, 2012 at 03:51:20PM -0400, Bo Chen wrote:
The sub-problems of "delta for large file" problem.
1 large file
1.1 text file (always delta well? need to be confirmed)
...But let's take a step back for a moment. Forget about whether a file
is binary or not. Imagine you want to store a very large file in
git.
...Nowadays, we stream large files directly into their own packfiles,
and we have to pay the I/O only once (and the memory cost never). As
a tradeoff, we no longer get delta compression of large objects.
That's OK for some large objects, like movie files (which don't tend
to delta well, anyway). But it's not for other objects, like virtual
machine images, which do tend to delta well.
So can we devise a solution which efficiently stores these
delta-friendly objects, without losing the performance improvements
we got with the stream-directly-to-packfile approach?
gitconfig or gitattributes could specify big-file handlers for
filetypes. It seems a bit ridiculous to expect git to autoconfigure
big-file handlers for everything from gif's to vm-images. In the case
of vm-images you would need to read the "big-files" man-page and then
configure your git for the "vm image handler" for whatever your vm-image
wildcards are for those files. For movie files you would also read the
big-file man-page and configure "movie file 'x' big file handler' for
whatever your movie file wildcards are. Movie files and vm-images are
very expectable (version control) but not very normative (source code
management) so you need to configure those as needed. More
widely-tracked-by-the-public-at-large files like gif, png, etc, could be
autoconfigured by git to used the correct big-file handler.
v/r,
neal
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html