On Thu, 19 Mar 2009, Junio C Hamano wrote:
Scott Chacon <schacon@xxxxxxxxx> writes:
But where Git instead stores a stub object and the large binary object
is pulled in via a separate mechanism. I was thinking that the client
could set a max file size and when a binary object larger than that is
staged, Git instead writes a stub blob like:
==
blob [size]\0
[sha of large blob]
==
An immediate pair of questions are, if you can solve the issue by
delegating large media to somebody else (i.e. "media server"), and that
somebody else can solve the issues you are having, (1) what happens if you
lower that "large" threashold to "0 byte"? Does that somebody else still
work fine, and does the git that uses indirection also still work fine?
If so why are you using git instead of that somebody else altogether?
ideally the difference between useing git with 'large' set to 0 and git
with no pack file should be an extra lookup for the indirection.
it may be that some other file manipulation may not be possible for
'large' files, resulting in some reduced functionality.
in any case, the added efficiancy of using pack files (both for local
storage and for network transport) will make handling the 'large' files
worse than the same size files through git (assuming that they can benifit
from delta compression)
and
(2) what prevents us from stealing the trick that somebody else uses so
that git itself can natively handle large blobs without indirection?
the key thing is that large files do not get mmaped or considered for
inclusion in pack files (including cloning and pulling pack files)
to make them full first-class citizens you would need to make alternate
code paths for everything that currently does mmap, making those paths
either process the file a different way. in the long run that may be the
best thing to do, but that's a lot of change compared to the proposed
change.
Without thinking the ramifications through myself, this sounds pretty much
like a band-aid and will nend up hitting the same "blob is larger than we
can handle" issue when you follow the indirection eventually, but that is
just my gut feeling.
it depends on what you are doing with that file when you get to it. if you
have to mmap it you may run into the same problem. but if the file is a
streaming video, you can transport it around (with rsync, http, etc)
without a problem, and using the file (playing the video) never keeps much
of the file in memory, so it will be very useful on systems that would
never have a chance of accessing the entire file through mmap.
David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html