Re: [PATCH 0/6] SLAB-ify nlm_host cache

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 24 Nov 2008 14:35:55 -0500

On Nov 24, 2008, at Nov 24, 2008, 1:38 PM, J. Bruce Fields wrote:
On Wed, Nov 05, 2008 at 04:56:36PM -0500, Trond Myklebust wrote:
On Wed, 2008-11-05 at 16:51 -0500, Chuck Lever wrote:
Here's a set of patches to change the NLM host cache to use a slab.

OK, I'll bite. Why would we care about slabifying the NLM host cache?

There's some argument on the 5th patch:

	"Now that we have a couple of large text buffers in the nlm_host
	struct, the nlm_host cache should use a slab to reduce memory
	utilization and fragmentation on systems that manage a large
	number of NLM peers.

	"We keep these hardware cache-aligned to speed up the linked
	list search in nlm_lookup_host().

	"The overhead of creating a fresh nlm_host entry is also reduced
	using SLAB's init_once callback instead of using kzalloc()."

Chuck, is there any hope of quantifying those improvements?

Using hardware performance counters, we can determine how often the  
TLB is accessed or changed during a typical nlm_host entry lookup.  We  
can also look at the average number of pages needed to store a large  
number of nlm_host entries in the common kmalloc-512 SLAB versus the  
optimum number of pages consumed if the entries were all in one SLAB.   
As fewer pages are accessed per lookup, this means the CPU has to  
handle fewer page translations.

On big systems it's easy to see how creating and expiring nlm_host  
entries might contend with other users of the kmalloc-512 SLAB.

As we modify the nlm_host garbage collector, it will become somewhat  
easier to release whole pages back to the page allocator when nlm_host  
entries expire.  If the host entries are mixed with other items on a  
SLAB cache page, it's harder to respond to memory pressure in this way.

To truly assess the performance implications of this change, we need  
to know how often the nlm_lookup_host() function is called by the  
server.  The client uses it only during mount so it's probably not  
consequential there.  The challenge here is that such improvements  
would only reveal themselves on excessively busy servers that are  
managing a large number of clients.  Not easy to replicate this  
scenario in a lab setting.

It's also useful to have a separate SLAB to enable debugging options  
on that cache, like poisoning and extensive checking during  
kmem_free(), without adversely impacting other areas of kernel  
operation.  Additionally we can use /proc/slabinfo to watch host cache  
statistics without adding any new kernel interfaces.  All of this will  
be useful for testing possible changes to the server-side reference  
counting and garbage collection logic.

The only argument I've heard against doing this is that creating  
unique SLABs is only for items that are typically quickly reused, like  
RPC buffers.  I don't find that a convincing reason not to SLAB-ify  
the host cache.  Quickly reused items are certainly one reason to  
create a unique SLAB, but there are several SLABs in the kernel that  
manage items that are potentially long-lived: the buffer head, dentry,  
and inode caches come to mind.

Additionally, on the server, the nlm_host entries can be turned around  
pretty quickly on a busy server.  This can become more important if we  
decide to implement, for example, an LRU "expired" list to help the  
garbage collector make better choices about what host entries to toss.

My feeling is that overall SLAB-ifying the host cache is only slightly  
less useful than splitting it.  The host cache already works  
adequately well for most typical NFS workloads.  I haven't seen anyone  
asking whether there is a convincing performance case for splitting  
the cache.

If we are already in the vicinity, we should consider adding a unique  
SLAB.  It's easy to do, and provides other minor benefits.  It will  
certainly not make performance worse, adds little complexity, and  
creates opportunities for other optimizations.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html