read-only working copy using symlinks to blobs

chadrik <chadrik@xxxxxxxxx> · Wed, 21 Jan 2009 00:15:11 -0800 (PST)

hi all,
i'm looking into using git to manage a lot of very large binary data.  git
seems particularly suited to this task because it has features for saving
disk space such as clone--shared, and it's fast due to simple compression by
default (instead of deltas).

in my mind, there's still one major feature for working with large binaries
that has not been addressed:  the ability to check out symbolic/hard links
to blobs into the working copy instead of creating duplicates of the files.

imagine a scenario where one user is putting large binary files into a git
repo.  100 other users need read-only access to this repo.  they clone the
repo shared, which saves disk space for the object files, but each of these
100 working copies also creates copies of all the binary files at the HEAD
revision. it would be 100x as efficient if, in place of these files,
symbolic or hard links were made to the blob files in .git/objects.  

the crux of the issue is that the blob objects would have to be stored as
exact copies of the original files.  i did some googling and it would seem
there are two things that currently prevent this from happening.  1) blobs
are stored with compression and 2) they include a small header.  compression
can be disabled by setting core.loosecompression to 0, so that seems like
less of an issue.  as for the header, wouldn't it be possible to store it as
a separate file per blob object and thus keep the original data completely
pristine? 

what are the caveats to a system like this?  any thoughts on the
feasibility?

-chad

-- 
View this message in context: http://www.nabble.com/read-only-working-copy-using-symlinks-to-blobs-tp21578696p21578696.html
Sent from the git mailing list archive at Nabble.com.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html