On Nov 24, 2008, at Nov 24, 2008, 1:38 PM, J. Bruce Fields wrote:
On Wed, Nov 05, 2008 at 04:56:36PM -0500, Trond Myklebust wrote:
On Wed, 2008-11-05 at 16:51 -0500, Chuck Lever wrote:
Here's a set of patches to change the NLM host cache to use a slab.
OK, I'll bite. Why would we care about slabifying the NLM host cache?
There's some argument on the 5th patch:
"Now that we have a couple of large text buffers in the nlm_host
struct, the nlm_host cache should use a slab to reduce memory
utilization and fragmentation on systems that manage a large
number of NLM peers.
"We keep these hardware cache-aligned to speed up the linked
list search in nlm_lookup_host().
"The overhead of creating a fresh nlm_host entry is also reduced
using SLAB's init_once callback instead of using kzalloc()."
Chuck, is there any hope of quantifying those improvements?
Using hardware performance counters, we can determine how often the
TLB is accessed or changed during a typical nlm_host entry lookup. We
can also look at the average number of pages needed to store a large
number of nlm_host entries in the common kmalloc-512 SLAB versus the
optimum number of pages consumed if the entries were all in one SLAB.
As fewer pages are accessed per lookup, this means the CPU has to
handle fewer page translations.
On big systems it's easy to see how creating and expiring nlm_host
entries might contend with other users of the kmalloc-512 SLAB.
As we modify the nlm_host garbage collector, it will become somewhat
easier to release whole pages back to the page allocator when nlm_host
entries expire. If the host entries are mixed with other items on a
SLAB cache page, it's harder to respond to memory pressure in this way.
To truly assess the performance implications of this change, we need
to know how often the nlm_lookup_host() function is called by the
server. The client uses it only during mount so it's probably not
consequential there. The challenge here is that such improvements
would only reveal themselves on excessively busy servers that are
managing a large number of clients. Not easy to replicate this
scenario in a lab setting.
It's also useful to have a separate SLAB to enable debugging options
on that cache, like poisoning and extensive checking during
kmem_free(), without adversely impacting other areas of kernel
operation. Additionally we can use /proc/slabinfo to watch host cache
statistics without adding any new kernel interfaces. All of this will
be useful for testing possible changes to the server-side reference
counting and garbage collection logic.
The only argument I've heard against doing this is that creating
unique SLABs is only for items that are typically quickly reused, like
RPC buffers. I don't find that a convincing reason not to SLAB-ify
the host cache. Quickly reused items are certainly one reason to
create a unique SLAB, but there are several SLABs in the kernel that
manage items that are potentially long-lived: the buffer head, dentry,
and inode caches come to mind.
Additionally, on the server, the nlm_host entries can be turned around
pretty quickly on a busy server. This can become more important if we
decide to implement, for example, an LRU "expired" list to help the
garbage collector make better choices about what host entries to toss.
My feeling is that overall SLAB-ifying the host cache is only slightly
less useful than splitting it. The host cache already works
adequately well for most typical NFS workloads. I haven't seen anyone
asking whether there is a convincing performance case for splitting
the cache.
If we are already in the vicinity, we should consider adding a unique
SLAB. It's easy to do, and provides other minor benefits. It will
certainly not make performance worse, adds little complexity, and
creates opportunities for other optimizations.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html