Re: Horrible re-packing?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 5 Jun 2006, Chris Wedgwood wrote:
>
> On Mon, Jun 05, 2006 at 05:22:02PM -0400, Nicolas Pitre wrote:
> 
> > Much more expensive for both memory usage and CPU cycles.
> 
> On a modern CPU I'm not sure you would notice unless the dataset was
> insanely large.

The dataset really _is_ pretty large.

For the kernel, we're talking about a quarter million strings, and it's 
not shrinking. Other (CVS imported from long histories) are in the several 
million object range.

The real problem, btw, is not the CPU cost of sorting them (hey, O(nlogn) 
works ok), but the memory use. We have to keep those quarter million names 
in memory too. At a "pointer + average of ~30 bytes of full pathname 
allocation + malloc overhead", the strings would basically take about 40 
bytes of memory each, and cause constant cache-misses.  In contrast, the 
"hash" value is a 32-bit unsigned, with no pointer overhead.

It's not the biggest part to keep track of (we need to obviously keep 
track of the 20-byte SHA1 and the pointers between objects), but if we had 
the full string info, it would be quite noticeable overhead, on the order 
of several tens of megabytes. 

Now, "tens of megabytes" is not a make-or-break issue any more (you 
definitely want lots of memory if you want to develop with large trees), 
but in a very real sense, the memory footprint of an app is often very 
closely correlated with its performance these days. 

Trying to keep things within the L2 cache can help a lot, and even if we 
expect 2MB and 4MB L2's to get more and more common, it means that we 
should _strive_ to keep datasets down. As it is, we have no _chance_ of 
staying in the L2 cache on the kernel, but for smaller projects we 
hopefully can still do so ;)

			Linus
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]