On Apr 22, 2006, at 08:45, Nicolas Pitre wrote:
The idea to avoid memory pressure is to reverse the window processing
such that the object to delta against is constant for the entire
window
instead of the current logic where the target object is constant.
This
way there would be only one index in memory at all time.
Right, this is essential. In my measurements, diff-delta
spends about 70% of time generating the index, and 30%
matching.
Right now, for 10 candidates per file, we'd do 11 units of
work, since we repeat the final delta. When reusing the
index, and keeping the smallest delta around, we'd use
0.7 + 3 = 3.7 units of work. This is almost a 3x speedup.
There's no way we can get decent performance without this.
With the similarity fingerprints, another factor 2x should
be attainable, by only considering the 3 files with the
nearest fingerprints.
-Geert
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html