Following up on this. Now with even more data! > On May 6, 2019, at 4:41 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > >> On May 3, 2019, at 7:18 AM, Trond Myklebust <trondmy@xxxxxxxxx> wrote: >> >> Allow more time for softirqd > > Have you thought about performance tests for this one? I tested this series on my 12-core two-socket client using a variety of tests including iozone, fio, and fstests. The network under test is 56Gb InfiniBand (TCP uses IPoIB). I tested both TCP and RDMA. With lock debugging and memory leak testing enabled, I did not see any functional regressions or new leaks or crashes. Thus IMO this series is "safe to apply." With TCP, I saw no change in performance between a "stock" kernel and one with all five patches in this series applied, as, IIRC, you predicted. The following discussion is based on testing with NFS/RDMA. With RDMA, I saw an improvement of 5-10% in IOPS rate between the "stock" kernel and a kernel with the first four patches applied. When the fifth patch is applied, I saw IOPS throughput significantly worse than "stock" -- like 20% worse. I also studied average RPC execution time (the "execute" metric) with the "stock" kernel, the one with four patches applied, and with the one where all five are applied. The workload is 100% 4KB READs with an iodepth of 1024 in order to saturate the transmit queue. With four patches, the execute time is about 2.5 msec faster (average execution time is around 75 msec due to the large backlog this test generates). With five patches, it's slower than "stock" by 12 msec. I also saw a 30 usec improvement in the average latency of xprt_complete_rqst with the four patch series. As far as I can tell, the benefit of this series comes mostly from the third patch, which changes spin_lock_bh(&xprt->transport_lock) to spin_lock(&xprt->transport_lock). When the xprtiod work queue is lowered in priority in 5/5, that benefit vanishes. I am still confused about why 5/5 is needed. I did not see any soft lockups without this patch applied when using RDMA. Is the issue with xprtsock's use of xprtiod for handling incoming TCP receives? I still have some things I'd like to look at. One thing I haven't yet tried is looking at lock_stat, which would confirm or refute my theory that this is all about the transport_lock, for instance. >> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >> --- >> net/sunrpc/sched.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c >> index c7e81336620c..6b37c9a4b48f 100644 >> --- a/net/sunrpc/sched.c >> +++ b/net/sunrpc/sched.c >> @@ -1253,7 +1253,7 @@ static int rpciod_start(void) >> goto out_failed; >> rpciod_workqueue = wq; >> /* Note: highpri because network receive is latency sensitive */ > > The above comment should be deleted as well. > > >> - wq = alloc_workqueue("xprtiod", WQ_UNBOUND|WQ_MEM_RECLAIM|WQ_HIGHPRI, 0); >> + wq = alloc_workqueue("xprtiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0); >> if (!wq) >> goto free_rpciod; >> xprtiod_workqueue = wq; > > -- > Chuck Lever > > > -- Chuck Lever