Re: small question about the repack algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 08, 2008 at 07:12:44AM +0000, Junio C Hamano wrote:
> Pierre Habouzit <madcoder@xxxxxxxxxx> writes:
> 
> >   I've trying to see if that optimization was used but I was somehow
> > unable to find if it was the case, as the code is a bit tough :)
> >
> >   I was wondering if the repacking window was using triangle inequality
> > to discard trying some costly deltas (I assume that what costs the most
> > in the repacking is computing the delta). I mean, if you consider the
> > "size" of a delta, I'm almost sure that it's very near a distance.
> >
> >   So assuming that we know the delta sizes between any pair of reference
> > objects in the window, well, if an object we want to delta against the
> > window Od are near one reference O1 enough, for each Oi in the window
> > that holds: len(δ(O1, Oi)) > 2 * len(δ(Od, O1)), then it's not worth
> > investigating.
> 
> We do not keep track of the delta size matrix between delta-base
> candidates in the window, but I presume we could.  The storage
> cost for doing so is very cheap (window^2 * size_t).  But we do
> not even compute the distance matrix fully (I'll mention the
> reason why the above is not (window^2 * size_t / 2) later).
> 
>     1-----------------------------i   <= delta-base candidates
>      \
>       \ 
>        D <-- the target we are considering
> 
> Your idea is that if we want to find the cheapest delta-base,
> and after we find out that candidate #1 is close to our target D
> and candidate #i is very far from candidate #1, then delta to
> create D using candidate #i as the base would be much bigger.
> If the distance space is Euclidean, that would be a nice
> optimization.

  Well, Euclidean is too much, a simple Metric Space is enough IIRC.

> The second reason is that the deltification is not symmetric.
> If you define the "distance" between #1 and #i as "the size of
> delta to reproduce #i using #1 as base", the distance between #1
> and #i is very different from the distance between #i and #1.

  Well that's the most obvious reason indeed and I totally missed that,
dang. The delta is not near a distance enough. Too bad :)

-- 
·O·  Pierre Habouzit
··O                                                madcoder@xxxxxxxxxx
OOO                                                http://www.madism.org

Attachment: pgp1mVWIJV8vQ.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux