Re: [PATCH] IB/cm: use rwlock for MAD agent lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 20, 2025 at 05:56:12PM +0000, Jacob Moroni wrote:
> In workloads where there are many processes establishing
> connections using RDMA CM in parallel (large scale MPI),
> there can be heavy contention for mad_agent_lock in
> cm_alloc_msg.
> 
> This contention can occur while inside of a spin_lock_irq
> region, leading to interrupts being disabled for extended
> durations on many cores. Furthermore, it leads to the
> serialization of rdma_create_ah calls, which has negative
> performance impacts for NICs which are capable of processing
> multiple address handle creations in parallel.
> 
> The end result is the machine becoming unresponsive, hung
> task warnings, netdev TX timeouts, etc.

While the patch and fix seems reasonable, I'm somewhat surprised to
see it.

If you are running at such a high workload then I'm shocked you don't
hit all the other nasty problems with RDMA CM scalability?

Is the issue that the AH creation is very slow for some reason? It has
been a longstanding peeve of mine that this is done under a spinlock
context, I've long felt that should be reworked and some of those
spinlocks converted to mutex's.

Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux