On Fri, Feb 08, 2008 at 07:12:44AM +0000, Junio C Hamano wrote: > Pierre Habouzit <madcoder@xxxxxxxxxx> writes: > > > I've trying to see if that optimization was used but I was somehow > > unable to find if it was the case, as the code is a bit tough :) > > > > I was wondering if the repacking window was using triangle inequality > > to discard trying some costly deltas (I assume that what costs the most > > in the repacking is computing the delta). I mean, if you consider the > > "size" of a delta, I'm almost sure that it's very near a distance. > > > > So assuming that we know the delta sizes between any pair of reference > > objects in the window, well, if an object we want to delta against the > > window Od are near one reference O1 enough, for each Oi in the window > > that holds: len(δ(O1, Oi)) > 2 * len(δ(Od, O1)), then it's not worth > > investigating. > > We do not keep track of the delta size matrix between delta-base > candidates in the window, but I presume we could. The storage > cost for doing so is very cheap (window^2 * size_t). But we do > not even compute the distance matrix fully (I'll mention the > reason why the above is not (window^2 * size_t / 2) later). > > 1-----------------------------i <= delta-base candidates > \ > \ > D <-- the target we are considering > > Your idea is that if we want to find the cheapest delta-base, > and after we find out that candidate #1 is close to our target D > and candidate #i is very far from candidate #1, then delta to > create D using candidate #i as the base would be much bigger. > If the distance space is Euclidean, that would be a nice > optimization. Well, Euclidean is too much, a simple Metric Space is enough IIRC. > The second reason is that the deltification is not symmetric. > If you define the "distance" between #1 and #i as "the size of > delta to reproduce #i using #1 as base", the distance between #1 > and #i is very different from the distance between #i and #1. Well that's the most obvious reason indeed and I totally missed that, dang. The delta is not near a distance enough. Too bad :) -- ·O· Pierre Habouzit ··O madcoder@xxxxxxxxxx OOO http://www.madism.org
Attachment:
pgp1mVWIJV8vQ.pgp
Description: PGP signature