Re: Git Large Object Support Proposal

david@xxxxxxx · Thu, 19 Mar 2009 16:52:19 -0700 (PDT)

On Thu, 19 Mar 2009, Junio C Hamano wrote:

Scott Chacon <schacon@xxxxxxxxx> writes:

The point is that we don't keep this data as 'blob's - we don't try to
compress them or add the header to them, they're too big and already
compressed, it's a waste of time and often outside the memory
tolerance of many systems. We keep only the stub in our db and stream
the large media content directly to and from disk.  If we do a
'checkout' or something that would switch it out, we could store the
data in '.git/media' or the equivalent until it's uploaded elsewhere.

Aha, that sounds like you can just maintain a set of out-of-tree symbolic
links that you keep track of, and let other people (e.g. rsync) deal with
the complexity of managing that side of the world.

And I think you can start experimenting it without any change to the core
datastructures.  In your single-page web site in which its sole html file
embeds an mpeg movie, you keep track of these two things in git:

	porn-of-the-day.html
       porn-of-the-day.mpg -> ../media/6066f5ae75ec.mpg

and any time you want to feed a new movie, you update the symlink to a
different one that lives outside the source-controlled tree, while
arranging the link target to be updated out-of-band.

that would work, but the proposed change has some advantages

1. you store the sha1 of the real mpg in the 'large file' blob so you can 
detect problems

2. since it knows the sha1 of the real file, it can auto-create the real 
file as needed, without wasting space on too many copies of it.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html