On Wed, 2017-07-05 at 11:11 -0400, Chuck Lever wrote: > > On Jul 5, 2017, at 10:44 AM, Olga Kornievskaia <aglo@xxxxxxxxx> > > wrote: > > > > On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust > > <trondmy@xxxxxxxxxxxxxxx> wrote: > > > On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote: > > > > Hi folks, > > > > > > > > On a multi-core machine, is it expected that we can have > > > > parallel > > > > RPCs > > > > handled by each of the per-core workqueue? > > > > > > > > In testing a read workload, observing via "top" command that a > > > > single > > > > "kworker" thread is running servicing the requests (no > > > > parallelism). > > > > It's more prominent while doing these operations over krb5p > > > > mount. > > > > > > > > What has been suggested by Bruce is to try this and in my > > > > testing I > > > > see then the read workload spread among all the kworker > > > > threads. > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx> > > > > > > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > > > > index 0cc8383..f80e688 100644 > > > > --- a/net/sunrpc/sched.c > > > > +++ b/net/sunrpc/sched.c > > > > @@ -1095,7 +1095,7 @@ static int rpciod_start(void) > > > > * Create the rpciod thread and wait for it to start. > > > > */ > > > > dprintk("RPC: creating workqueue rpciod\n"); > > > > - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0); > > > > + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM | WQ_UNBOUND, > > > > 0); > > > > if (!wq) > > > > goto out_failed; > > > > rpciod_workqueue = wq; > > > > > > > > > > WQ_UNBOUND turns off concurrency management on the thread pool > > > (See > > > Documentation/core-api/workqueue.rst. It also means we contend > > > for work > > > item queuing/dequeuing locks, since the threads which run the > > > work > > > items are not bound to a CPU. > > > > > > IOW: This is not a slam-dunk obvious gain. > > > > I agree but I think it's worth consideration. I'm waiting to get > > (real) performance numbers of improvement (instead of my VM setup) > > to > > help my case. However, it was reported 90% degradation for the read > > performance over krb5p when 1CPU is executing all ops. > > > > Is there a different way to make sure that on a multi-processor > > machine we can take advantage of all available CPUs? Simple kernel > > threads instead of a work queue? > > There is a trade-off between spreading the work, and ensuring it > is executed on a CPU close to the I/O and application. IMO UNBOUND > is a good way to do that. UNBOUND will attempt to schedule the > work on the preferred CPU, but allow it to be migrated if that > CPU is busy. > > The advantage of this is that when the client workload is CPU > intensive (say, a software build), RPC client work can be scheduled > and run more quickly, which reduces latency. > That should no longer be a huge issue, since queue_work() will now default to the WORK_CPU_UNBOUND flag, which prefers the local CPU, but will schedule elsewhere if the local CPU is congested. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥