Re: [PATCH] IB/cm: use rwlock for MAD agent lock

Zhu Yanjun <yanjun.zhu@xxxxxxxxx> · Sat, 22 Feb 2025 07:20:51 +0100

在 2025/2/21 18:32, Eric Dumazet 写道:
On Fri, Feb 21, 2025 at 6:04 PM Zhu Yanjun <yanjun.zhu@xxxxxxxxx> wrote:

On 20.02.25 18:56, Jacob Moroni wrote:
In workloads where there are many processes establishing
connections using RDMA CM in parallel (large scale MPI),
there can be heavy contention for mad_agent_lock in
cm_alloc_msg.

This contention can occur while inside of a spin_lock_irq
region, leading to interrupts being disabled for extended
durations on many cores. Furthermore, it leads to the
serialization of rdma_create_ah calls, which has negative
performance impacts for NICs which are capable of processing
multiple address handle creations in parallel.

In the link:
https://www.cs.columbia.edu/~jae/4118-LAST/L12-interrupt-spinlock.html
"
...
spin_lock() / spin_unlock()

must not lose CPU while holding a spin lock, other threads will wait for
the lock for a long time

spin_lock() prevents kernel preemption by ++preempt_count in
uniprocessor, that’s all spin_lock() does

must NOT call any function that can potentially sleep
ex) kmalloc, copy_from_user

hardware interrupt is ok unless the interrupt handler may try to lock
this spin lock
spin lock not recursive: same thread locking twice will deadlock

keep the critical section as small as possible
...
"
And from the source code, it seems that spin_lock/spin_unlock are not
related with interrupts.

I wonder why "leading to interrupts being disabled for extended
durations on many cores" with spin_lock/spin_unlock?

I am not against this commit. I am just obvious why
spin_lock/spin_unlock are related with "interrupts being disabled".

Look at drivers/infiniband/core/cm.c

spin_lock_irqsave(&cm_id_priv->lock, flags);

Thanks a lot. spin_lock_irq should be spin_lock_irqsave?

Follow the steps of reproducer, I can not reproduce this problem on 
KVMs. Maybe I need a powerful host.

Anyway, read_lock should be a lighter lock than spin_lock.

Thanks,
Reviewed-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>

Zhu Yanjun

-> Then call cm_alloc_msg() while hard IRQ are masked.