Re: [PATCH] IB/cm: use rwlock for MAD agent lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> If you are running at such a high workload then I'm shocked you don't
> hit all the other nasty problems with RDMA CM scalability?

It could be that we just haven't hit those issues yet :)

This serialization was slowing things down so much that I think it
has been masking some other issues. For instance, I just discovered
a bug in rping's persistent server mode (I'll be sending a Github PR
soon), which seems to be due to a race condition we started hitting
after this fix.

> Is the issue that the AH creation is very slow for some reason? It has
> been a longstanding peeve of mine that this is done under a spinlock
> context, I've long felt that should be reworked and some of those
> spinlocks converted to mutex's.

Yes, that's exactly it. We have fairly high tail latencies for creating
address handles. By removing the serialization, we can at least take
advantage of queueing, which seems to help a lot. It would be really
great if this could move out of an atomic context.

Thanks,
Jake

On Fri, Feb 21, 2025 at 12:32 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Fri, Feb 21, 2025 at 6:04 PM Zhu Yanjun <yanjun.zhu@xxxxxxxxx> wrote:
> >
> > On 20.02.25 18:56, Jacob Moroni wrote:
> > > In workloads where there are many processes establishing
> > > connections using RDMA CM in parallel (large scale MPI),
> > > there can be heavy contention for mad_agent_lock in
> > > cm_alloc_msg.
> > >
> > > This contention can occur while inside of a spin_lock_irq
> > > region, leading to interrupts being disabled for extended
> > > durations on many cores. Furthermore, it leads to the
> > > serialization of rdma_create_ah calls, which has negative
> > > performance impacts for NICs which are capable of processing
> > > multiple address handle creations in parallel.
> >
> > In the link:
> > https://www.cs.columbia.edu/~jae/4118-LAST/L12-interrupt-spinlock.html
> > "
> > ...
> > spin_lock() / spin_unlock()
> >
> > must not lose CPU while holding a spin lock, other threads will wait for
> > the lock for a long time
> >
> > spin_lock() prevents kernel preemption by ++preempt_count in
> > uniprocessor, that’s all spin_lock() does
> >
> > must NOT call any function that can potentially sleep
> > ex) kmalloc, copy_from_user
> >
> > hardware interrupt is ok unless the interrupt handler may try to lock
> > this spin lock
> > spin lock not recursive: same thread locking twice will deadlock
> >
> > keep the critical section as small as possible
> > ...
> > "
> > And from the source code, it seems that spin_lock/spin_unlock are not
> > related with interrupts.
> >
> > I wonder why "leading to interrupts being disabled for extended
> > durations on many cores" with spin_lock/spin_unlock?
> >
> > I am not against this commit. I am just obvious why
> > spin_lock/spin_unlock are related with "interrupts being disabled".
>
> Look at drivers/infiniband/core/cm.c
>
> spin_lock_irqsave(&cm_id_priv->lock, flags);
>
> -> Then call cm_alloc_msg() while hard IRQ are masked.





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux