On Wed, May 27, 2009 at 01:37:26PM -0400, Nicolas Pitre wrote: > My idea for handling big files is simply to: > > 1) Define a new parameter to determine what is considered a big file. > > 2) Store any file larger than the treshold defined in (1) directly into > a pack of their own at "git add" time. > > 3) Never attempt to diff nor delta large objects, again according to > (1) above. It is typical for large files not to be deltifiable, and > a diff for files in the thousands of megabytes cannot possibly be > sane. What about large files that have a short metadata section that may change? Versions with only the metadata changed delta well, and with a custom diff driver, can produce useful diffs. And I don't think that is an impractical or unlikely example; large files can often be tagged media. Linus' "split into multiple objects" approach means you could perhaps split intelligently into metadata and "uninteresting data" sections based on the file type. That would make things like rename detection very fast. Of course it has the downside that you are cementing whatever split you made into history for all time. And it means that two people adding the same content might end up with different trees. Both things that git tries to avoid. I wonder if it would be useful to make such a split at _read_ time. That is, still refer to the sha-1 of the whole content in the tree objects, but have a separate cache that says "hash X splits to the concatenation of Y,Z". Thus you can always refer to the "pure" object, both as a user, and in the code. So we could avoid retrofitting all of the code -- just some parts like diff might want to handle an object in multiple segments. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html