On 3/14/25 6:43 AM, Robin Murphy wrote: > On 2025-03-13 7:20 pm, Lucas via Bugspray Bot wrote: > [...] >> system: Suermicro AS-4124GS-TNR >> cpu: AMD EPYC 7H12 64-Core Processor >> ram: 512G >> rdma nic: Mellanox Technologies MT2910 Family [ConnectX-7] >> >> >>>> [ 976.677373] __dma_map_sg_attrs+0x139/0x1b0 >>>> [ 976.677380] dma_map_sgtable+0x21/0x50 >>> >>> So, here (and above) is where we leave the NFS server and venture into >>> the IOMMU layer. Adding the I/O folks for additional eyes. >>> >>> Can you give us the output of: >>> >>> $ scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92 >>> >> >> >> root@test:/usr/src/linux-6.13.6# scripts/faddr2line drivers/iommu/ >> iova.o alloc_iova+0x92 >> alloc_iova+0x92/0x290: >> __alloc_and_insert_iova_range at /usr/src/linux-6.13.6/drivers/iommu/ >> iova.c:180 >> (inlined by) alloc_iova at /usr/src/linux-6.13.6/drivers/iommu/iova.c:263 >> root@test:/usr/src/linux-6.13.6# > > OK so this is waiting for iova_rbtree_lock to get into the allocation > slowpath since there was nothing suitable in the IOVA caches. Said > slowpath under the lock is unfortunately prone to being quite slow, > especially as the rbtree fills up with massive numbers of relatively > small allocations (which I'm guessing I/O with a 4KB block size would > tend towards). If you have 256 threads all contending the same path then > they could certainly end up waiting a while, although they shouldn't be > *permanently* stuck... The reported PID is different on every stack dump, so this doesn't look like a permanent stall for any of the nfsd threads. But is there a way that NFSD can reduce the amount of IOVA fragmentation it causes? I wouldn't think that a similar multi-threaded 4KB I/O workload on a local disk would result in the same kind of stalling behavior. I also note that the stack trace is the same for each occurance: [ 1047.817528] alloc_iova+0x92/0x290 [ 1047.817534] ? __alloc_pages_noprof+0x191/0x1280 [ 1047.817542] ? current_time+0x2d/0x120 [ 1047.817548] alloc_iova_fast+0x1fb/0x400 [ 1047.817554] iommu_dma_alloc_iova+0xa2/0x190 [ 1047.817559] iommu_dma_map_sg+0x447/0x4e0 [ 1047.817566] __dma_map_sg_attrs+0x139/0x1b0 [ 1047.817572] dma_map_sgtable+0x21/0x50 [ 1047.817578] rdma_rw_ctx_init+0x6c/0x820 [ib_core] [ 1047.817720] ? srso_return_thunk+0x5/0x5f [ 1047.817729] svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma] [ 1047.817757] svc_rdma_build_writes+0xa5/0x210 [rpcrdma] [ 1047.817774] ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma] [ 1047.817791] ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma] [ 1047.817810] svc_rdma_xb_write+0x7d/0xb0 [rpcrdma] [ 1047.817828] svc_rdma_send_write_list+0x144/0x290 [rpcrdma] svc_rdma_send_write_list() appears in all of these. This function assembles an NFS READ response that will use an RDMA Write to convey the I/O payload to the NFS client. -- Chuck Lever