Re: GSoC - Some questions on the idea of "Better big-file support".

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 28, 2012 at 11:38 AM, Bo Chen <chen@xxxxxxxxxxxxxx> wrote:
> Hi, Everyone. This is Bo Chen. I am interested in the idea of "Better
> big-file support".
>
> As it is described in the idea page,
> "Many large files (like media) do not delta very well. However, some
> do (like VM disk images). Git could split large objects into smaller
> chunks, similar to bup, and find deltas between these much more
> manageable chunks. There are some preliminary patches in this
> direction, but they are in need of review and expansion."
>
> Can anyone elaborate a little bit why many large files do not delta
> very well?

Large files are usually binary. Depends on the type of binary, they
may or may not delta well. Those that are compressed/encrypted
obviously don't delta well because one change can make the final
result completely different.

Another problem with delta-ing large files with git is, current code
needs to load two files in memory for delta. Consuming 4G for delta 2
2GB files does not sound good.

> Is it a general problem or a specific problem just for Git?
> I am really new to Git, can anyone give me some hints on which source
> codes I should read to learn more about the current code on delta
> operation? It is said that "there are some preliminary patches in this
> direction", where can I find these patches?

Read about rsync algorithm [2]. Bup [1] implements the same (I think)
algorithm, but on top of git. For preliminary patches, have a look at
jc/split-blob series at commit 4a1242d in git.git.

[1] https://github.com/apenwarr/bup
[2] http://en.wikipedia.org/wiki/Rsync#Algorithm
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]