Re: On Tracking Binary Files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio,

Thanks a lot for your thorough explanation..

Patrick

On Tue, Apr 14, 2009 at 16:05, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
>
>> On Tue, 14 Apr 2009, Patrick Berkeley wrote:
>>
>>> Does Git track the deltas on binary files?
>>>
>>> Someone in #git mentioned that if the binaries change too much Git no
>>> longer just stores the changes. If this is the case, what is the
>>> breaking point where Git goes from storing the deltas to the entire
>>> new file?
>>
>> Git does not store the deltas as you think it does.  The deltification of
>> the objects is almost independent from the commmit history, i.e. we
>> _always_ store snapshots for most practical matters.
>
> Always store snapshots sounds as if you are not storing delta at all.  I
> think I know what you meant to say, but the way you phrased it is
> misleading.
>
> Documentation/technical/pack-heuristics.txt talks about this in some
> detail.  A short version is:
>
>  - It does not make a difference if you are dealing with binary or text;
>
>  - The delta is not necessarily against the same path in the previous
>   revision, so even a new file added to the history can be stored in a
>   delitified form;
>
>  - When an object stored in the deltified representation is used, it would
>   incur more cost than using the same object in the compressed base
>   representation.  The deltification mechanism makes a trade-off taking
>   this cost into account, as well as the space efficiency.
>
> The last point may probably be not covered by pack-heuristics IRC talk
> Linus had in the documentation.  Basically:
>
>  - A deltified object is stored as an (compressed) xdelta against some
>   base object.  If the best deltified representation we come up with is
>   larger than the result of just compressing the object without
>   deltification, it is not worth storing it from the space comsumption
>   point of view.  Thus, we originally said something like "if an
>   attempted delta is larger than half of the object size (assuming
>   average 50% of compression ratio), do not use the deltified
>   representation, it is not worth it".  We attempt to delta against many
>   base objects to pick the best possible delta; the number of attempt is
>   called the delta window.
>
>  - The base object of a deltified object could also be deltified, and you
>   may need to repeatedly apply delta on top of some object that is not a
>   delta to get to the final object.  The length of this chain is called
>   delta depth, and obviously you would want to keep the delta depth short
>   to gain a reasonable runtime performance.  Thus, when delitifying one
>   object A, we make a weighted comparison between the size of the delta
>   to build it out of an object of depth N and the size of the delta to
>   build it out of an object of depth M.  A slightly larger delta that is
>   based on an object with a shallower delta depth is favored over a
>   smaller delta based on an object with a much deeper delta depth.
>
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]