Re: NFS Server Issues with RDMA in Kernel 6.13.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/14/25 6:43 AM, Robin Murphy wrote:
> On 2025-03-13 7:20 pm, Lucas via Bugspray Bot wrote:
> [...]
>> system: Suermicro AS-4124GS-TNR
>> cpu: AMD EPYC 7H12 64-Core Processor
>> ram: 512G
>> rdma nic: Mellanox Technologies MT2910 Family [ConnectX-7]
>>
>>
>>>> [  976.677373]  __dma_map_sg_attrs+0x139/0x1b0
>>>> [  976.677380]  dma_map_sgtable+0x21/0x50
>>>
>>> So, here (and above) is where we leave the NFS server and venture into
>>> the IOMMU layer. Adding the I/O folks for additional eyes.
>>>
>>> Can you give us the output of:
>>>
>>>    $ scripts/faddr2line drivers/iommu/iova.o alloc_iova+0x92
>>>
>>
>>
>> root@test:/usr/src/linux-6.13.6# scripts/faddr2line drivers/iommu/
>> iova.o alloc_iova+0x92
>> alloc_iova+0x92/0x290:
>> __alloc_and_insert_iova_range at /usr/src/linux-6.13.6/drivers/iommu/
>> iova.c:180
>> (inlined by) alloc_iova at /usr/src/linux-6.13.6/drivers/iommu/iova.c:263
>> root@test:/usr/src/linux-6.13.6#
> 
> OK so this is waiting for iova_rbtree_lock to get into the allocation
> slowpath since there was nothing suitable in the IOVA caches. Said
> slowpath under the lock is unfortunately prone to being quite slow,
> especially as the rbtree fills up with massive numbers of relatively
> small allocations (which I'm guessing I/O with a 4KB block size would
> tend towards). If you have 256 threads all contending the same path then
> they could certainly end up waiting a while, although they shouldn't be
> *permanently* stuck...

The reported PID is different on every stack dump, so this doesn't look
like a permanent stall for any of the nfsd threads.

But is there a way that NFSD can reduce the amount of IOVA fragmentation
it causes? I wouldn't think that a similar multi-threaded 4KB I/O
workload on a local disk would result in the same kind of stalling
behavior.

I also note that the stack trace is the same for each occurance:

[ 1047.817528]  alloc_iova+0x92/0x290
[ 1047.817534]  ? __alloc_pages_noprof+0x191/0x1280
[ 1047.817542]  ? current_time+0x2d/0x120
[ 1047.817548]  alloc_iova_fast+0x1fb/0x400
[ 1047.817554]  iommu_dma_alloc_iova+0xa2/0x190
[ 1047.817559]  iommu_dma_map_sg+0x447/0x4e0
[ 1047.817566]  __dma_map_sg_attrs+0x139/0x1b0
[ 1047.817572]  dma_map_sgtable+0x21/0x50
[ 1047.817578]  rdma_rw_ctx_init+0x6c/0x820 [ib_core]
[ 1047.817720]  ? srso_return_thunk+0x5/0x5f
[ 1047.817729]  svc_rdma_rw_ctx_init+0x49/0xf0 [rpcrdma]
[ 1047.817757]  svc_rdma_build_writes+0xa5/0x210 [rpcrdma]
[ 1047.817774]  ? __pfx_svc_rdma_pagelist_to_sg+0x10/0x10 [rpcrdma]
[ 1047.817791]  ? svc_rdma_send_write_list+0xf4/0x290 [rpcrdma]
[ 1047.817810]  svc_rdma_xb_write+0x7d/0xb0 [rpcrdma]
[ 1047.817828]  svc_rdma_send_write_list+0x144/0x290 [rpcrdma]

svc_rdma_send_write_list() appears in all of these.

This function assembles an NFS READ response that will use an RDMA Write
to convey the I/O payload to the NFS client.


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux