I recently back ported Neil's lwq code and sunrpc server changes to our 5.15.130 based kernel in the hope of improving the performance for our data servers. Our performance team recently ran a fio workload on a client that was doing 100% NFSv3 reads in O_DIRECT mode over an RDMA connection (infiniband) against that resulting server. I've attached the resulting flame graph from a perf profile run on the server side. Is anyone else seeing this massive contention for the spin lock in __lwq_dequeue? As you can see, it appears to be dwarfing all the other nfsd activity on the system in question here, being responsible for 45% of all the perf hits. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx
Attachment:
5.15.130-200.pd.124.el8.x86_64.dsx.svg
Description: 5.15.130-200.pd.124.el8.x86_64.dsx.svg