Re: [PATCH 0/4] sunrpc: reduce pool->sp_lock contention when queueing a xprt for servicing

Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx> · Tue, 25 Nov 2014 19:38:18 -0500

On Tue, 25 Nov 2014 19:09:41 -0500
"J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:

> On Tue, Nov 25, 2014 at 04:25:57PM -0500, Jeff Layton wrote:
> > On Fri, 21 Nov 2014 14:19:27 -0500
> > Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
> > 
> > > Hi Bruce!
> > > 
> > > Here are the patches that I had mentioned earlier that reduce the
> > > contention for the pool->sp_lock when the server is heavily loaded.
> > > 
> > > The basic problem is that whenever a svc_xprt needs to be queued up for
> > > servicing, we have to take the pool->sp_lock to try and find an idle
> > > thread to service it.  On a busy server, that lock becomes highly
> > > contended and that limits the throughput.
> > > 
> > > This patchset fixes this by changing how we search for an idle thread.
> > > First, we convert svc_rqst and the sp_all_threads list to be
> > > RCU-managed. Then we change the search for an idle thread to use the
> > > sp_all_threads list, which now can be done under the rcu_read_lock.
> > > When there is an available thread, queueing an xprt to it can now be
> > > done without any spinlocking.
> > > 
> > > With this, we see a pretty substantial increase in performance on a
> > > larger-scale server that is heavily loaded. Chris has some preliminary
> > > numbers, but they need to be cleaned up a bit before we can present
> > > them. I'm hoping to have those by early next week.
> > > 
> > > Jeff Layton (4):
> > >   sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
> > >   sunrpc: fix potential races in pool_stats collection
> > >   sunrpc: convert to lockless lookup of queued server threads
> > >   sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
> > > 
> > >  include/linux/sunrpc/svc.h    |  12 +-
> > >  include/trace/events/sunrpc.h |  98 +++++++++++++++-
> > >  net/sunrpc/svc.c              |  17 +--
> > >  net/sunrpc/svc_xprt.c         | 252 ++++++++++++++++++++++++------------------
> > >  4 files changed, 258 insertions(+), 121 deletions(-)
> > > 
> > 
> > Here's what I've got so far.
> > 
> > This is just a chart that shows the % increase in the number of iops in
> > a distributed test on a NFSv3 server with this patchset vs. without.
> > 
> > The numbers along the bottom show the number of total job threads
> > running. Chris says:
> > 
> > "There were 64 nfsd threads running on the server.
> > 
> >  There were 7 hypervisors running 2 VMs each running 2 and 4 threads per
> >  VM.  Thus, 56 and 112 threads total."
> 
> Thanks!
> 

Good questions all around. I'll try to answer them as best I can:

> Results that someone else could reproduce would be much better.
> (Where's the source code for the test?

The test is just fio (which is available in the fedora repos, fwiw):

    http://git.kernel.dk/?p=fio.git;a=summary

...but we'd have to ask Chris for the job files. Chris, can those be
released?

>  What's the base the patchset was
> applied to?

The base was a v3.14-ish kernel with a pile of patches on top (mostly,
the ones that Trond asked you to merge for v3.18). The only difference
between the "baseline" and "patched" kernels is this set, plus a few
patches from upstream that made it apply more cleanly. None of those
should have much effect on the results though.

> What was the hardware? 

Again, I'll have to defer that question to Chris. I don't know much
about the hw in use here, other than that it has some pretty fast
storage (high perf. SSDs).

> I understand that's a lot of
> information.)  But it's nice to see some numbers at least.
> 
> (I wonder what the reason is for the odd shape in the 112-thread case
> (descending slightly as the writes decrease and then shooting up when
> they go to zero.)  OK, I guess that's what you get if you just assume
> read-write contention is expensive and one write is slightly more
> expensive than one read.  But then why doesn't it behave the same way in
> the 56-thread case?)
> 

Yeah, I wondered about that too.

There is some virtualization in use on the clients here (and it's
vmware too), so I have to wonder if there's some variance in the
numbers due to weirdo virt behaviors or something.

The good news is that the overall trend pretty clearly shows a
performance increase.

As always, benchmark results point out the need for more benchmarks.

-- 
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html