Sorry for replying late. My questions are inline in the following. On Wed, Mar 28, 2012 at 2:19 AM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote: > On Wed, Mar 28, 2012 at 11:38 AM, Bo Chen <chen@xxxxxxxxxxxxxx> wrote: >> Hi, Everyone. This is Bo Chen. I am interested in the idea of "Better >> big-file support". >> >> As it is described in the idea page, >> "Many large files (like media) do not delta very well. However, some >> do (like VM disk images). Git could split large objects into smaller >> chunks, similar to bup, and find deltas between these much more >> manageable chunks. There are some preliminary patches in this >> direction, but they are in need of review and expansion." >> >> Can anyone elaborate a little bit why many large files do not delta >> very well? > > Large files are usually binary. Depends on the type of binary, they > may or may not delta well. Those that are compressed/encrypted > obviously don't delta well because one change can make the final > result completely different. Just make clear one of my confusions. Delta operation is to find out the differences between different versions of the same file, right? As I know, delta encoding is to re-encode a file based on the differences between neighboring blocks, thus can help compress a file since after delta encoding, we will have more similar data within the file. Can anyone elaborate a little bit what is the relation between delta operation in git and delta encoding listed above? Thanks. > > Another problem with delta-ing large files with git is, current code > needs to load two files in memory for delta. Consuming 4G for delta 2 > 2GB files does not sound good. I am wondering why we cannot divide the 2 2GB files into chunks and delta chunks by chunks. Is that any difference, except a little more IOs? > >> Is it a general problem or a specific problem just for Git? >> I am really new to Git, can anyone give me some hints on which source >> codes I should read to learn more about the current code on delta >> operation? It is said that "there are some preliminary patches in this >> direction", where can I find these patches? > > Read about rsync algorithm [2]. Bup [1] implements the same (I think) > algorithm, but on top of git. For preliminary patches, have a look at > jc/split-blob series at commit 4a1242d in git.git. Make clear my another confusion. The file which has been updated (added, deleted, and modified) is first delta-compressed, and then synchronize to the remote repo by some mechanism (rsync?). I am wondering what is the the relationship between delta operation and rsync. > > [1] https://github.com/apenwarr/bup > [2] http://en.wikipedia.org/wiki/Rsync#Algorithm > -- > Duy Bo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html