Re: [RFC] fix parallelism for rpc tasks

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 5 Jul 2017 15:46:35 +0000

On Wed, 2017-07-05 at 11:11 -0400, Chuck Lever wrote:
> > On Jul 5, 2017, at 10:44 AM, Olga Kornievskaia <aglo@xxxxxxxxx>
> > wrote:
> > 
> > On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:
> > > > Hi folks,
> > > > 
> > > > On a multi-core machine, is it expected that we can have
> > > > parallel
> > > > RPCs
> > > > handled by each of the per-core workqueue?
> > > > 
> > > > In testing a read workload, observing via "top" command that a
> > > > single
> > > > "kworker" thread is running servicing the requests (no
> > > > parallelism).
> > > > It's more prominent while doing these operations over krb5p
> > > > mount.
> > > > 
> > > > What has been suggested by Bruce is to try this and in my
> > > > testing I
> > > > see then the read workload spread among all the kworker
> > > > threads.
> > > > 
> > > > Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx>
> > > > 
> > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > > index 0cc8383..f80e688 100644
> > > > --- a/net/sunrpc/sched.c
> > > > +++ b/net/sunrpc/sched.c
> > > > @@ -1095,7 +1095,7 @@ static int rpciod_start(void)
> > > >  * Create the rpciod thread and wait for it to start.
> > > >  */
> > > >  dprintk("RPC:       creating workqueue rpciod\n");
> > > > - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);
> > > > + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM | WQ_UNBOUND,
> > > > 0);
> > > >  if (!wq)
> > > >  goto out_failed;
> > > >  rpciod_workqueue = wq;
> > > > 
> > > 
> > > WQ_UNBOUND turns off concurrency management on the thread pool
> > > (See
> > > Documentation/core-api/workqueue.rst. It also means we contend
> > > for work
> > > item queuing/dequeuing locks, since the threads which run the
> > > work
> > > items are not bound to a CPU.
> > > 
> > > IOW: This is not a slam-dunk obvious gain.
> > 
> > I agree but I think it's worth consideration. I'm waiting to get
> > (real) performance numbers of improvement (instead of my VM setup)
> > to
> > help my case. However, it was reported 90% degradation for the read
> > performance over krb5p when 1CPU is executing all ops.
> > 
> > Is there a different way to make sure that on a multi-processor
> > machine we can take advantage of all available CPUs? Simple kernel
> > threads instead of a work queue?
> 
> There is a trade-off between spreading the work, and ensuring it
> is executed on a CPU close to the I/O and application. IMO UNBOUND
> is a good way to do that. UNBOUND will attempt to schedule the
> work on the preferred CPU, but allow it to be migrated if that
> CPU is busy.
> 
> The advantage of this is that when the client workload is CPU
> intensive (say, a software build), RPC client work can be scheduled
> and run more quickly, which reduces latency.
> 

That should no longer be a huge issue, since queue_work() will now
default to the WORK_CPU_UNBOUND flag, which prefers the local CPU, but
will schedule elsewhere if the local CPU is congested.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥