Re: [RFC] fix parallelism for rpc tasks

Olga Kornievskaia <aglo@xxxxxxxxx> · Wed, 5 Jul 2017 10:44:33 -0400

On Mon, Jul 3, 2017 at 10:58 AM, Trond Myklebust
<trondmy@xxxxxxxxxxxxxxx> wrote:
> On Thu, 2017-06-29 at 09:25 -0400, Olga Kornievskaia wrote:
>> Hi folks,
>>
>> On a multi-core machine, is it expected that we can have parallel
>> RPCs
>> handled by each of the per-core workqueue?
>>
>> In testing a read workload, observing via "top" command that a single
>> "kworker" thread is running servicing the requests (no parallelism).
>> It's more prominent while doing these operations over krb5p mount.
>>
>> What has been suggested by Bruce is to try this and in my testing I
>> see then the read workload spread among all the kworker threads.
>>
>> Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx>
>>
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 0cc8383..f80e688 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -1095,7 +1095,7 @@ static int rpciod_start(void)
>>   * Create the rpciod thread and wait for it to start.
>>   */
>>   dprintk("RPC:       creating workqueue rpciod\n");
>> - wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 0);
>> + wq = alloc_workqueue("rpciod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
>>   if (!wq)
>>   goto out_failed;
>>   rpciod_workqueue = wq;
>>
>
> WQ_UNBOUND turns off concurrency management on the thread pool (See
> Documentation/core-api/workqueue.rst. It also means we contend for work
> item queuing/dequeuing locks, since the threads which run the work
> items are not bound to a CPU.
>
> IOW: This is not a slam-dunk obvious gain.

I agree but I think it's worth consideration. I'm waiting to get
(real) performance numbers of improvement (instead of my VM setup) to
help my case. However, it was reported 90% degradation for the read
performance over krb5p when 1CPU is executing all ops.

Is there a different way to make sure that on a multi-processor
machine we can take advantage of all available CPUs? Simple kernel
threads instead of a work queue?

Can/should we have an WQ_UNBOUND work queue for secure mounts and
another queue for other mounts?

While I wouldn't call krb5 load long running, Documentation says that
an example for WQ_UNBOUND is for CPU intensive workloads. And also in
general "work items are not expected to hog a CPU and consume many
cycles". How "many" is too "many". How many operations are crypto
operations?
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html