read-only working copies using links

Chad Dombrova <chadrik@xxxxxxxxx> · Sat, 24 Jan 2009 01:17:19 -0800

hi all,

there's a major feature for working with large binaries that has not  
yet been addressed by git:  the ability to check out a file as a  
symbolic/hard link to a blob in the repository, instead of duplicating  
the file into the working copy.

imagine a scenario where one user is putting large binary files into a  
git repo on a networked server.  100 other users on the server need  
read-only access to this repo.  they clone the repo using --shared or  
--local, which saves disk space for the object files, but each of  
these 100 working copies also creates copies of all the binary files  
at the HEAD revision. it would be 100x as efficient in both disk space  
and checkout speeds if, in place of these files, symbolic or hard  
links were made to the blob files in .git/objects.

the crux of the issue is that the blob objects would have to be stored  
as exact copies of the original files.  it would seem there are two  
things that currently prevent this from happening.  1) blobs are  
stored with compression and 2) they include a small header.   
compression can be disabled by setting core.loosecompression to 0, so  
that seems like less of an issue.  as for the header, wouldn't it be  
possible to store it separately?  in other words, store two files per  
blob directory, a small stub file with the header info and the  
unaltered file data.

what are the caveats to a system like this?  has anyone looked into  
this before?

-chad

p.s.
i tried submitting a post through nabble a few days and it said that  
it was still pending, so i thought i'd try submitting directly to the  
mailing list.  sorry, if i end up double-posting

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html