在 2025/2/21 18:32, Eric Dumazet 写道:
On Fri, Feb 21, 2025 at 6:04 PM Zhu Yanjun <yanjun.zhu@xxxxxxxxx> wrote:
On 20.02.25 18:56, Jacob Moroni wrote:
In workloads where there are many processes establishing
connections using RDMA CM in parallel (large scale MPI),
there can be heavy contention for mad_agent_lock in
cm_alloc_msg.
This contention can occur while inside of a spin_lock_irq
region, leading to interrupts being disabled for extended
durations on many cores. Furthermore, it leads to the
serialization of rdma_create_ah calls, which has negative
performance impacts for NICs which are capable of processing
multiple address handle creations in parallel.
In the link:
https://www.cs.columbia.edu/~jae/4118-LAST/L12-interrupt-spinlock.html
"
...
spin_lock() / spin_unlock()
must not lose CPU while holding a spin lock, other threads will wait for
the lock for a long time
spin_lock() prevents kernel preemption by ++preempt_count in
uniprocessor, that’s all spin_lock() does
must NOT call any function that can potentially sleep
ex) kmalloc, copy_from_user
hardware interrupt is ok unless the interrupt handler may try to lock
this spin lock
spin lock not recursive: same thread locking twice will deadlock
keep the critical section as small as possible
...
"
And from the source code, it seems that spin_lock/spin_unlock are not
related with interrupts.
I wonder why "leading to interrupts being disabled for extended
durations on many cores" with spin_lock/spin_unlock?
I am not against this commit. I am just obvious why
spin_lock/spin_unlock are related with "interrupts being disabled".
Look at drivers/infiniband/core/cm.c
spin_lock_irqsave(&cm_id_priv->lock, flags);
Thanks a lot. spin_lock_irq should be spin_lock_irqsave?
Follow the steps of reproducer, I can not reproduce this problem on
KVMs. Maybe I need a powerful host.
Anyway, read_lock should be a lighter lock than spin_lock.
Thanks,
Reviewed-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>
Zhu Yanjun
-> Then call cm_alloc_msg() while hard IRQ are masked.