Hey, On Thu, Mar 19, 2009 at 3:31 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Scott Chacon <schacon@xxxxxxxxx> writes: > >> But where Git instead stores a stub object and the large binary object >> is pulled in via a separate mechanism. I was thinking that the client >> could set a max file size and when a binary object larger than that is >> staged, Git instead writes a stub blob like: >> >> == >> blob [size]\0 >> [sha of large blob] >> == > > An immediate pair of questions are, if you can solve the issue by > delegating large media to somebody else (i.e. "media server"), and that > somebody else can solve the issues you are having, (1) what happens if you > lower that "large" threashold to "0 byte"? Does that somebody else still > work fine, and does the git that uses indirection also still work fine? > If so why are you using git instead of that somebody else altogether? and In theory it would work fine, where all the commits/trees are transferred over git and all the blobs are basically stored elsewhere, but I would assume it would be much slower for the end user and so nobody would do that. I would imagine users would only use/enable this at all if they have large media files that they don't want to have every version of which cloned every time. I can't imagine that this would be used at all by more than a small percentage of users, but when large media does need to be in source code, they will not use Git (they will use Perforce or SVN), or they will put it in there and then kill their (or our) servers when upload-pack tries to mmap it (twice, yes?). I thought it would be much more efficient for Git to have the ability to simply mark files that don't make sense to pack up and be able to keep track of and transfer them via a more appropriate protocol. > (2) what prevents us from stealing the trick that somebody else uses so > that git itself can natively handle large blobs without indirection? > Actually, I'm fine with that - phase two of this project, if it made sense at all, would be to have another set of git transfer commands that allowed large blobs to be uploaded/downloaded separately, importantly not passing them in the packfile and keeping them loose, uncompressed and headerless on disk so they can simply be streamed when requested. I am thinking entirely about movies and images that are already compressed and there is simply no need to load them entirely into memory. I simply thought that taking advantage of services that already did this (scp, sftp, s3) would be quicker than building another set of transfer protocols into Git. > Without thinking the ramifications through myself, this sounds pretty much > like a band-aid and will nend up hitting the same "blob is larger than we > can handle" issue when you follow the indirection eventually, but that is > just my gut feeling. The point is that we don't keep this data as 'blob's - we don't try to compress them or add the header to them, they're too big and already compressed, it's a waste of time and often outside the memory tolerance of many systems. We keep only the stub in our db and stream the large media content directly to and from disk. If we do a 'checkout' or something that would switch it out, we could store the data in '.git/media' or the equivalent until it's uploaded elsewhere. > > This is an off-topic "By the way", but has another topic addressed to you > on git-scm.com/about resolved in any way yet? > Thanks for pointing that out, I missed that thread. I actually just pushed out some changes over the last few days - I added the Gnome project since they just announced they're moving to Git, added a link to the new OReilly book that just was released and I pulled in some validation and other misc changes that had been contributed. Currently I have to re-gen the Authors data manually, so I do it every once in a while - I just pushed up new data. Doing it per release is a good idea, I'll try to get that in the release script. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html