On 2025-03-13 7:20 pm, Lucas via Bugspray Bot wrote:
[...]
system: Suermicro AS-4124GS-TNR
cpu: AMD EPYC 7H12 64-Core Processor
ram: 512G
rdma nic: Mellanox Technologies MT2910 Family [ConnectX-7]
[ 976.677373] __dma_map_sg_attrs+0x139/0x1b0
[ 976.677380] dma_map_sgtable+0x21/0x50
So, here (and above) is where we leave the NFS server and venture into
the IOMMU layer. Adding the I/O folks for additional eyes.
Can you give us the output of:
$ scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
root@test:/usr/src/linux-6.13.6# scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
alloc_iova+0x92/0x290:
__alloc_and_insert_iova_range at /usr/src/linux-6.13.6/drivers/iommu/iova.c:180
(inlined by) alloc_iova at /usr/src/linux-6.13.6/drivers/iommu/iova.c:263
root@test:/usr/src/linux-6.13.6#
OK so this is waiting for iova_rbtree_lock to get into the allocation
slowpath since there was nothing suitable in the IOVA caches. Said
slowpath under the lock is unfortunately prone to being quite slow,
especially as the rbtree fills up with massive numbers of relatively
small allocations (which I'm guessing I/O with a 4KB block size would
tend towards). If you have 256 threads all contending the same path then
they could certainly end up waiting a while, although they shouldn't be
*permanently* stuck...
Thanks,
Robin.