Re: git annotate runs out of memory

Davide Libenzi <davidel@xxxxxxxxxxxxxxx> · Tue, 11 Dec 2007 17:12:21 -0800 (PST)

On Tue, 11 Dec 2007, Linus Torvalds wrote:

> > Libxdiff already has a xdl_trim_ends() that strips all the common 
> > beginning and ending records, but at that point files are already loaded.
> 
> That's not the problem. The problem with xdl_trim_ends() is that it 
> happens *after* you have done all the hashing, so as an optimization it's 
> fairly useless, because it still leaves the real cost (the per-line 
> hashing) on the table.

Careful. The real cost of diffing, is not the O(1) pass of the prepare 
phase. It's the potentially O(N*M) worst case of the cross-record compare. 
So that optimization is far from useless. That optimization is indeed 
mainly targeted to avoid such worst case.

> So doing the trimming of the ends before you do even that, allows you to 
> just do the trivial "let's see if the ends are identical" with a plain 
> memcmp, which is much faster.

Yes, tail trimming done on a block-basis is faster and does not consume 
memory. The code for libxdiff would have to be a bit more complex though, 
since memory files can be composed by many sections, of different sizes 
(so you cannot just assume it's a single block you're trimming the end). 
Also, you'd need some code at the end that hands you back at least the N 
lines you want for context.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html