On Thu, Feb 20, 2025 at 05:56:12PM +0000, Jacob Moroni wrote: > In workloads where there are many processes establishing > connections using RDMA CM in parallel (large scale MPI), > there can be heavy contention for mad_agent_lock in > cm_alloc_msg. > > This contention can occur while inside of a spin_lock_irq > region, leading to interrupts being disabled for extended > durations on many cores. Furthermore, it leads to the > serialization of rdma_create_ah calls, which has negative > performance impacts for NICs which are capable of processing > multiple address handle creations in parallel. > > The end result is the machine becoming unresponsive, hung > task warnings, netdev TX timeouts, etc. While the patch and fix seems reasonable, I'm somewhat surprised to see it. If you are running at such a high workload then I'm shocked you don't hit all the other nasty problems with RDMA CM scalability? Is the issue that the AH creation is very slow for some reason? It has been a longstanding peeve of mine that this is done under a spinlock context, I've long felt that should be reworked and some of those spinlocks converted to mutex's. Jason