On Sat, Apr 23, 2022 at 9:48 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > > On 4/22/22 20:18, Zhu Yanjun wrote: > > On Sat, Apr 23, 2022 at 6:11 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > >> > >> On 4/22/22 16:00, Jason Gunthorpe wrote: > >>> On Fri, Apr 22, 2022 at 01:32:24PM -0500, Bob Pearson wrote: > >>>> Jason, > >>>> > >>>> I am confused a little. > >>>> > >>>> - xa_alloc_xxx internally takes xa->xa_lock with a spinlock but > >>>> has a gfp_t parameter which is normally GFP_KERNEL. So I trust them when they say > >>>> that it releases the lock around kmalloc's by 'magic' as you say. > >>>> > >>>> - The only read side operation on the rxe pool xarrays is in rxe_pool_get_index() but > >>>> that will be protected by a rcu_read_lock so it can't deadlock with the write > >>>> side spinlocks regardless of type (plain, _bh, _saveirq) > >>>> > >>>> - Apparently CM is calling ib_create_ah while holding spin locks. This would > >>>> call xa_alloc_xxx which would unlock xa_lock and call kmalloc(..., GFP_KERNEL) > >>>> which should cause a warning for AH. You say it does not because xarray doesn't > >>>> call might_sleep(). > >>>> > >>>> I am not sure how might_sleep() works. When I add might_sleep() just ahead of > >>>> xa_alloc_xxx() it does not cause warnings for CM test cases (e.g. rping.) > >>>> Another way to study this would be to test for in_atomic() but > >>> > >>> might_sleep should work, it definately triggers from inside a > >>> spinlock. Perhaps you don't have all the right debug kconfig enabled? > >>> > >>>> that seems to be discouraged and may not work as assumed. It's hard to reproduce > >>>> evidence that ib_create_ah really has spinlocks held by the caller. I think it > >>>> was seen in lockdep traces but I have a hard time reading them. > >>> > >>> There is a call to create_ah inside RDMA CM that is under a spinlock > >>> > >>>> - There is a lot of effort trying to make 'deadlocks' go away. But the read side > >>>> is going to end as up rcu_read_lock so there soon will be no deadlocks with > >>>> rxe_pool_get_index() possible. xarrays were designed to work well with rcu > >>>> so it would better to just go ahead and do it. Verbs objects tend to be long > >>>> lived with lots of IO on each instance. This is a perfect use case for rcu. > >>> > >>> Yes > >>> > >>>> I think this means there is no reason for anything but a plain spinlock in rxe_alloc > >>>> and rxe_add_to_pool. > >>> > >>> Maybe, are you sure there are no other xa spinlocks held from an IRQ? > >>> > >>> And you still have to deal with the create AH called in an atomic > >>> region. > >> > >> There are only 3 references to the xarrays: > >> > >> 1. When an object is allocated. Either from rxe_alloc() which is called > >> an MR is registered or from rxe_add_to_pool() when the other > >> objects are created. > >> 2. When an object is looked up from rxe_pool_get_index() > >> 3. When an object is cleaned up from rxe_xxx_destroy() and similar. > >> > >> For non AH objects the create and destroy verbs are always called in process > >> context and non-atomic and the lookup routine is normally called in soft IRQ > >> context but doesn't take a lock when rcu is used so can't deadlock. > > > > Are you sure about this? There are about several non AH objects. > > You can make sure that all in process context? > > > > And you can ensure it in the future? > > I added the line > > WARN_ON(!in_task()); > > to rxe_alloc and rxe_add_to_pool > > and it never triggered. I would be theoretically possible for someone to try Your test environment does not mean that all the test environments will not trigger this. > to write a module that responds to an interrupt in some wigit and then > tries to start a verbs session. But it would be very strange. It is reasonable > to just declare that the verbs APIs are not callable in interrupt (soft or hard) This will limit verbs APIs use. > context. I believe this is tacitly understood and currently is true. It is a > separate issue whether or not the caller is in 'atomic context' which includes > holding a spinlock and implies that the thread cannot sleep. the xarray code > if you look at the code for xa_alloc_cyclic() does take the xa_lock spinlock around > _xa_alloc_cyclic() but they also say in the comments that they release that lock > internally before calling kmalloc() and friends if the gfp_t parameter is > GFP_KERNEL so that it is safe to sleep. However cma.c calls ib_create_ah() while > holding spinlocks (happens to be spin_lock_saveirq() but that doesn't matter here > since any spinlock makes it bad to call kmalloc()). So we have to use GFP_ATOMIC > for AH objects. > > It is clear that the assumption of the verbs APIs is that they are always called > in process context. How to ensure that they are not called in interrupt context? And some kernel modules also use the rdma. > > > >> > >> For AH objects the create call is always called in process context but may > > > > How to ensure it? > > Same way. It is true now see the WARN_ON above. > > > >> or may not hold an irq spinlock so hard interrupts are disabled to prevent > >> deadlocking CMs locks. The cleanup call is also in process context but also > >> may or may not hold an irq spinlock (not sure if it happens). These calls > >> can't deadlock each other for the xa_lock because there either won't be an > >> interrupt or because the process context calls don't cause reentering the > >> rxe code. They also can't deadlock with the lookup call when it is using rcu. > > > > From you, all the operations including create, destroy and lookup are > > in process context or soft IRQ context. > > How to ensure it? I mean that all operations are always in process or > > soft IRQ context, and will not violate? > > Even though in different calls ? > > We have no way to get into interrupt context. We get called from above through > the verbs APIs in process context and from below by NAPI passing network packets > to us in softirq context. We also internally defer work to tasklets which are > always soft IRQs. Unless someone outside of the rdma-core subsystem called into > verbs calls from interrupt context (weird) it can never happen. And has never > happened. Again don't get confused with irqsave spinlocks. They are used in > process or soft irq context sometimes and disable hardware interrupts to prevent > a deadlock with another spinlock in a hardware interrupt handler. But we don't > have any code that runs in hardware interrupt context to worry about. I doubt > cma does either but many people use irqsave spinlocks when they are not needed. How to verify that many people use unnecessary irqsave spinlock? Zhu Yanjun > > > > Zhu Yanjun > > > >> > >>> > >>>> To sum up once we have rcu enabled the only required change is to use GFP_ATOMIC > >>>> or find a way to pre-allocate for AH objects (assuming that I can convince myself > >>>> that ib_create_ah really comes with spinlocks held). > >>> > >>> Possibly yes > >>> > >>> Jason > >> >