Re: contention on pwq->pool->lock under heavy NFS workload

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tejun-


> On Jun 23, 2023, at 9:44 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> 
> Hey,
> 
> On Fri, Jun 23, 2023 at 02:37:17PM +0000, Chuck Lever III wrote:
>> I'm using NFS/RDMA for my test because I can drive more IOPS with it.
>> 
>> I've found that setting the nfsiod and rpciod workqueues to "cpu"
>> scope provide the best benefit for this workload. Changing the
>> xprtiod workqueue to "cpu" had no discernible effect.
>> 
>> This tracks with the number of queue_work calls for each of these
>> WQs. 59% of queue_work calls during the test are for the rpciod
>> WQ, 21% are for nfsiod, and 2% is for xprtiod.
>> 
>> The same test with TCP (using IP-over-IB on the same physical network)
>> shows no improvement on any test. That suggests there is a bottleneck
>> somewhere else, when using TCP, that limits its throughput.
> 
> Yeah, you can make the necessary workqueues to default to CPU or SMT scope
> using apply_workqueue_attrs(). The interface a bit cumbersome and we
> probably wanna add convenience helpers to switch e.g. affinity scopes but
> it's still just several lines of code.

6037 static ssize_t wq_affn_scope_store(struct device *dev,
6038                                    struct device_attribute *attr,
6039                                    const char *buf, size_t count)
6040 {
6041         struct workqueue_struct *wq = dev_to_wq(dev);
6042         struct workqueue_attrs *attrs;
6043         int affn, ret = -ENOMEM;
6044
6045         affn = parse_affn_scope(buf);
6046         if (affn < 0)
6047                 return affn;
6048
6049         apply_wqattrs_lock();             <<< takes &wq_pool_mutex
6050         attrs = wq_sysfs_prep_attrs(wq);  <<< copies the wq_attrs
6051         if (attrs) {
6052                 attrs->affn_scope = affn;
6053                 ret = apply_workqueue_attrs_locked(wq, attrs);
6054         }
6055         apply_wqattrs_unlock();
6056         free_workqueue_attrs(attrs);
6057         return ret ?: count;
6058 }   

Both wq_pool_mutex and copy_workqueue_attrs() are static, so having
only apply_workqueue_attrs() is not yet enough to carry this off
in workqueue consumers such as sunrpc.ko.

It looks like padata_setup_cpumasks() for example is holding the
CPU read lock, but it doesn't take the wq_pool_mutex.
apply_wqattrs_prepare() has a "lockdep_assert_held(&wq_pool_mutex);" .

I can wait for a v3 of this series so you can construct the public
API the way you prefer.


--
Chuck Lever






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux