Hi Tejun- > On Jun 23, 2023, at 9:44 PM, Tejun Heo <tj@xxxxxxxxxx> wrote: > > Hey, > > On Fri, Jun 23, 2023 at 02:37:17PM +0000, Chuck Lever III wrote: >> I'm using NFS/RDMA for my test because I can drive more IOPS with it. >> >> I've found that setting the nfsiod and rpciod workqueues to "cpu" >> scope provide the best benefit for this workload. Changing the >> xprtiod workqueue to "cpu" had no discernible effect. >> >> This tracks with the number of queue_work calls for each of these >> WQs. 59% of queue_work calls during the test are for the rpciod >> WQ, 21% are for nfsiod, and 2% is for xprtiod. >> >> The same test with TCP (using IP-over-IB on the same physical network) >> shows no improvement on any test. That suggests there is a bottleneck >> somewhere else, when using TCP, that limits its throughput. > > Yeah, you can make the necessary workqueues to default to CPU or SMT scope > using apply_workqueue_attrs(). The interface a bit cumbersome and we > probably wanna add convenience helpers to switch e.g. affinity scopes but > it's still just several lines of code. 6037 static ssize_t wq_affn_scope_store(struct device *dev, 6038 struct device_attribute *attr, 6039 const char *buf, size_t count) 6040 { 6041 struct workqueue_struct *wq = dev_to_wq(dev); 6042 struct workqueue_attrs *attrs; 6043 int affn, ret = -ENOMEM; 6044 6045 affn = parse_affn_scope(buf); 6046 if (affn < 0) 6047 return affn; 6048 6049 apply_wqattrs_lock(); <<< takes &wq_pool_mutex 6050 attrs = wq_sysfs_prep_attrs(wq); <<< copies the wq_attrs 6051 if (attrs) { 6052 attrs->affn_scope = affn; 6053 ret = apply_workqueue_attrs_locked(wq, attrs); 6054 } 6055 apply_wqattrs_unlock(); 6056 free_workqueue_attrs(attrs); 6057 return ret ?: count; 6058 } Both wq_pool_mutex and copy_workqueue_attrs() are static, so having only apply_workqueue_attrs() is not yet enough to carry this off in workqueue consumers such as sunrpc.ko. It looks like padata_setup_cpumasks() for example is holding the CPU read lock, but it doesn't take the wq_pool_mutex. apply_wqattrs_prepare() has a "lockdep_assert_held(&wq_pool_mutex);" . I can wait for a v3 of this series so you can construct the public API the way you prefer. -- Chuck Lever