Re: [PATCH v3 1/3] Implement the patience diff algorithm

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Wed, 7 Jan 2009 21:00:40 +0100 (CET)

Hi,

On Wed, 7 Jan 2009, Linus Torvalds wrote:

> On Wed, 7 Jan 2009, Davide Libenzi wrote:
> > 
> > xdiff allows for diffing ranges, and the most efficent method is 
> > exactly how you did ;)
> 
> No, the performance problem is how it needs to re-hash everything. xdiff 
> doesn't seem to have any way to use a "subset of the hash", so what the 
> patience diff does is to use a subset of the mmfile, and then that will 
> force all the rehashing to take place, which is kind of sad.
> 
> It would be nice (for patience diff) if it could partition the _hashes_ 
> instead of partitioning the _data_. That way it wouldn't need to rehash. 
> See?

Actually, for libxdiff (non-patience), this is not possible, as 
xdl_cleanup_records() rewrites the hashes so that they are in a contiguous 
interval (0, ..., N-1), where N is the number of distinct lines found.

I am also pretty certain that at least the non-patience part needs the 
maximum of that "hash" to be as small as possible.

Having said that, if Linus likes patience diff enough to want it faster, 
Dscho will improve the speed by avoiding the rehashing for the patience 
diff part (although the lines for which patience has to fall back to Myers 
diff will _still_ rehash, but that does not hurt _that_ much in practice).

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html