Re: How should I handle binary file with GIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 5 Apr 2006, Randal L. Schwartz wrote:

> >>>>> "Nicolas" == Nicolas Pitre <nico@xxxxxxx> writes:
> 
> >> IIRC bsdiff is used by Firefox to distribute binary software updates.
> >> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
> >> supposedly offers worse compression (bigger diffs).
> 
> Nicolas> We already have our own delta code for pack storage.
> 
> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved.  I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted?  When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?

First of all, does cherry-picking binary patches is a sensible thing to 
do?

Do you expect, say, a Word document, a JPEG image, or an MP3 file to 
still be valid and error free if two binary patches modifying a 
different part of the same file (same revision) are successively 
applied?  I seriously doubt it.

And what do you do with conflicts?  Using diff3 might be sensible for 
text data, but for binaries you really need a tool that understands the 
type of data your binary contains, which means one tool for each 
possible type of binary data which is outside the scope of GIT.

For example, if you patch a .wav file adding some data, then you end up 
with the additional samples and a new length in the file header.  If 
another patch to that .wav is applied, then it is easy to find the 
"surrounding context" where the second patch is adding/removing some 
other samples, but then you really needs knowledge about the .wav format 
to handle the conflict that will occur on the .wav header modification.

And so on for all possible binary types.

So IMHO a binary patch format is only useful for easy _transport_ along 
with other text patches.  And the binary patch must either apply 
perfectly against the same source file or it must not apply at all.  
That's the only sensible accommodation we can do with a generic binary 
patch format.

When the patch doesn't apply to your tree, then nothing prevents you 
from hooking a dedicated tool that will pick up the original file, the 
reconstructed remote version according to the binary patch you received 
and your own modified version so that tool can process them and do the 
necessary changes with proper knowledge of the data format.


Nicolas
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]