Re: [RFC PATCH 5/5] SUNRPC: Reduce the priority of the xprtiod queue

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Tue, 28 May 2019 19:33:23 +0000

On Tue, 2019-05-28 at 15:03 -0400, Chuck Lever wrote:
> Following up on this. Now with even more data!
> 
> > On May 6, 2019, at 4:41 PM, Chuck Lever <chuck.lever@xxxxxxxxxx>
> > wrote:
> > 
> > 
> > > On May 3, 2019, at 7:18 AM, Trond Myklebust <trondmy@xxxxxxxxx>
> > > wrote:
> > > 
> > > Allow more time for softirqd
> > 
> > Have you thought about performance tests for this one?
> 
> I tested this series on my 12-core two-socket client using a variety
> of tests including iozone, fio, and fstests. The network under test
> is 56Gb InfiniBand (TCP uses IPoIB). I tested both TCP and RDMA.
> 
> With lock debugging and memory leak testing enabled, I did not see
> any functional regressions or new leaks or crashes. Thus IMO this
> series is "safe to apply."
> 
> With TCP, I saw no change in performance between a "stock" kernel
> and one with all five patches in this series applied, as, IIRC,
> you predicted.
> 
> The following discussion is based on testing with NFS/RDMA.
> 
> With RDMA, I saw an improvement of 5-10% in IOPS rate between the
> "stock" kernel and a kernel with the first four patches applied. When
> the fifth patch is applied, I saw IOPS throughput significantly worse
> than "stock" -- like 20% worse.
> 
> I also studied average RPC execution time (the "execute" metric) with
> the "stock" kernel, the one with four patches applied, and with the
> one where all five are applied. The workload is 100% 4KB READs with
> an iodepth of 1024 in order to saturate the transmit queue.
> 
> With four patches, the execute time is about 2.5 msec faster (average
> execution time is around 75 msec due to the large backlog this test
> generates). With five patches, it's slower than "stock" by 12 msec.
> 
> I also saw a 30 usec improvement in the average latency of
> xprt_complete_rqst with the four patch series.
> 
> As far as I can tell, the benefit of this series comes mostly from
> the third patch, which changes spin_lock_bh(&xprt->transport_lock) to
> spin_lock(&xprt->transport_lock). When the xprtiod work queue is
> lowered in priority in 5/5, that benefit vanishes.
> 
> I am still confused about why 5/5 is needed. I did not see any soft
> lockups without this patch applied when using RDMA. Is the issue
> with xprtsock's use of xprtiod for handling incoming TCP receives?
> 
> I still have some things I'd like to look at. One thing I haven't
> yet tried is looking at lock_stat, which would confirm or refute
> my theory that this is all about the transport_lock, for instance.
> 

OK. I can drop 5/5.

The issue there was not about soft lockups. However since we were
previously running most soft irqs as part of spin_unlock_bh(), the
question was whether or not we would see more of them needing to move
to softirqd. As far as I can see, your answer to that question is 'no'
(at least for your system).

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx