On Wed, Apr 3, 2019 at 6:36 PM Jeff King <peff@xxxxxxxx> wrote: > I suspect we could do even better by storing and reusing not just the > original blob between diffs, but the intermediate diff state (i.e., the > hashes produced by xdl_prepare(), which should be usable between > multiple diffs). That's quite a bit more complex, though, and I imagine > would require some surgery to xdiff. Amazing. xdl_prepare_ctx and xdl_hash_record (called inside xdl_prepare_ctx) account for 36% according to 'perf report'. Please tell me you just did not get this on your first guess. I tracked and dumped all the hashes that are sent to xdl_prepare() and it looks like the amount of duplicates is quite high. There are only about 1000 one-time hashes out of 7000 (didn't really draw a histogram to examine closer). So yeah this looks really promising, assuming somebody is going to do something about it. -- Duy