On Fri, Mar 30, 2012 at 03:11:40PM -0400, Bo Chen wrote: > Just make clear one of my confusions. Delta operation is to find out > the differences between different versions of the same file, right? > As I know, delta encoding is to re-encode a file based on the > differences between neighboring blocks, thus can help compress a file > since after delta encoding, we will have more similar data within the > file. Can anyone elaborate a little bit what is the relation between > delta operation in git and delta encoding listed above? Thanks. Sort of. Git is snapshot based. So each version of a file is its own "object", and from a high-level view, we store all objects. But we store the logical objects themselves in packfiles, in which the actual representation of the object may be stored as a difference to another object (which is likely to be a different version of the same file, but does not have to be). Here's some background reading: http://progit.org/book/ch1-3.html http://progit.org/book/ch9-4.html > I am wondering why we cannot divide the 2 2GB files into chunks and > delta chunks by chunks. Is that any difference, except a little more > IOs? It's more complicated than that. What if the file is re-ordered? You would want to compare early chunks in one version against later chunks in the other. So yes, you can reduce memory pressure by doing more I/O, but doing too much I/O will be very slow. Coming up with a solution is part of what this project is about. And chunking is part of that solution. > > Read about rsync algorithm [2]. Bup [1] implements the same (I think) > > algorithm, but on top of git. For preliminary patches, have a look at > > jc/split-blob series at commit 4a1242d in git.git. > > Make clear my another confusion. The file which has been updated > (added, deleted, and modified) is first delta-compressed, and then > synchronize to the remote repo by some mechanism (rsync?). I am > wondering what is the the relationship between delta operation and > rsync. No, the updated file is delta compressed into a packfile, and the packfile is transmitted. Rsync comes into play because it uses a novel chunking algorithm, which was copied by bup (and is referred to as the "bupsplit" algorithm). Read up on how bup works and why it was invented. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html