Re: Performance problem, long run of identical hashes

David Kastrup <dak@xxxxxxx> · Mon, 10 Dec 2007 20:39:25 +0100

Nicolas Pitre <nico@xxxxxxx> writes:

> On Mon, 10 Dec 2007, Jon Smirl wrote:
>
>> Running oprofile during my gcc repack shows this loop as the hottest
>> place in the code by far.
>
> Well, that is kind of expected.
>
>> I added some debug printfs which show that I
>> have a 100,000+ run of identical hash entries. Processing the 100,000
>> entries also causes RAM consumption to explode.
>
> That is impossible.  If you look at the code where those hash entries 
> are created in create_delta_index(), you'll notice a hard limit of 
> HASH_LIMIT (currently 64) is imposed on the number of identical hash 
> entries.

On the other hand, if we have, say, laaaaaarge streaks of zeros, what
happens is that we have 64 hashes seeing them.  Now about 4096 bytes are
compared, and then the comparison stops.  Then it scans backwards to
seek for more zeros (and finds about 64k of them before it stops) and
folds them into the current compacted form.  Each of these backward
scans (of which we have 64 in the worst case) is in a different memory
area.  So since we scan/compare areas of 64k for each advance of 4k, we
have an overscanning factor of 16 (for a worst case scenario).

Not sure whether this is what we are seeing here.  It would still not
explain exploding memory usage I think.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html