On Wed, Mar 28, 2012 at 11:38 AM, Bo Chen <chen@xxxxxxxxxxxxxx> wrote: > Hi, Everyone. This is Bo Chen. I am interested in the idea of "Better > big-file support". > > As it is described in the idea page, > "Many large files (like media) do not delta very well. However, some > do (like VM disk images). Git could split large objects into smaller > chunks, similar to bup, and find deltas between these much more > manageable chunks. There are some preliminary patches in this > direction, but they are in need of review and expansion." > > Can anyone elaborate a little bit why many large files do not delta > very well? Large files are usually binary. Depends on the type of binary, they may or may not delta well. Those that are compressed/encrypted obviously don't delta well because one change can make the final result completely different. Another problem with delta-ing large files with git is, current code needs to load two files in memory for delta. Consuming 4G for delta 2 2GB files does not sound good. > Is it a general problem or a specific problem just for Git? > I am really new to Git, can anyone give me some hints on which source > codes I should read to learn more about the current code on delta > operation? It is said that "there are some preliminary patches in this > direction", where can I find these patches? Read about rsync algorithm [2]. Bup [1] implements the same (I think) algorithm, but on top of git. For preliminary patches, have a look at jc/split-blob series at commit 4a1242d in git.git. [1] https://github.com/apenwarr/bup [2] http://en.wikipedia.org/wiki/Rsync#Algorithm -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html