Re: Performance problem, long run of identical hashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 10 Dec 2007, David Kastrup wrote:

> Nicolas Pitre <nico@xxxxxxx> writes:
> 
> > On Mon, 10 Dec 2007, Jon Smirl wrote:
> >
> >> I added some debug printfs which show that I
> >> have a 100,000+ run of identical hash entries. Processing the 100,000
> >> entries also causes RAM consumption to explode.
> >
> > That is impossible.  If you look at the code where those hash entries 
> > are created in create_delta_index(), you'll notice a hard limit of 
> > HASH_LIMIT (currently 64) is imposed on the number of identical hash 
> > entries.
> 
> On the other hand, if we have, say, laaaaaarge streaks of zeros, what
> happens is that we have 64 hashes seeing them.  Now about 4096 bytes are
> compared, and then the comparison stops.

No.

What would happen in that case is that the first hash entry pointing 
somewhere in the beginning of the zero chunk will match _at least_ 4096 
bytes, probably much more, up to the end of the buffer if that is 
possible.

> Then it scans backwards to
> seek for more zeros (and finds about 64k of them before it stops)

The first hash entry for those zeroes is going to point at an offset not 
greater than 15 bytes from the start of the zero area.  So, unless both 
buffers are going to match even before that zero area, the backward scan 
will stop quickly.

> folds them into the current compacted form.  Each of these backward
> scans (of which we have 64 in the worst case) is in a different memory
> area.  So since we scan/compare areas of 64k for each advance of 4k, we
> have an overscanning factor of 16 (for a worst case scenario).

Actually, if we matched 100MB of zeroes, we'll just encode that in a 
suite of 64KB copy operations, and continue scanning the buffer after 
that 100MB.

So I don't see where your scan/compare areas of 64k for each advance of 
4k comes from.

> Not sure whether this is what we are seeing here.  It would still not
> explain exploding memory usage I think.

Right.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux