Re: much about ah objects in rxe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 23, 2022 at 9:48 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote:
>
> On 4/22/22 20:18, Zhu Yanjun wrote:
> > On Sat, Apr 23, 2022 at 6:11 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote:
> >>
> >> On 4/22/22 16:00, Jason Gunthorpe wrote:
> >>> On Fri, Apr 22, 2022 at 01:32:24PM -0500, Bob Pearson wrote:
> >>>> Jason,
> >>>>
> >>>> I am confused a little.
> >>>>
> >>>>  - xa_alloc_xxx internally takes xa->xa_lock with a spinlock but
> >>>>    has a gfp_t parameter which is normally GFP_KERNEL. So I trust them when they say
> >>>>    that it releases the lock around kmalloc's by 'magic' as you say.
> >>>>
> >>>>  - The only read side operation on the rxe pool xarrays is in rxe_pool_get_index() but
> >>>>    that will be protected by a rcu_read_lock so it can't deadlock with the write
> >>>>    side spinlocks regardless of type (plain, _bh, _saveirq)
> >>>>
> >>>>  - Apparently CM is calling ib_create_ah while holding spin locks. This would
> >>>>    call xa_alloc_xxx which would unlock xa_lock and call kmalloc(..., GFP_KERNEL)
> >>>>    which should cause a warning for AH. You say it does not because xarray doesn't
> >>>>    call might_sleep().
> >>>>
> >>>> I am not sure how might_sleep() works. When I add might_sleep() just ahead of
> >>>> xa_alloc_xxx() it does not cause warnings for CM test cases (e.g. rping.)
> >>>> Another way to study this would be to test for in_atomic() but
> >>>
> >>> might_sleep should work, it definately triggers from inside a
> >>> spinlock. Perhaps you don't have all the right debug kconfig enabled?
> >>>
> >>>> that seems to be discouraged and may not work as assumed. It's hard to reproduce
> >>>> evidence that ib_create_ah really has spinlocks held by the caller. I think it
> >>>> was seen in lockdep traces but I have a hard time reading them.
> >>>
> >>> There is a call to create_ah inside RDMA CM that is under a spinlock
> >>>
> >>>>  - There is a lot of effort trying to make 'deadlocks' go away. But the read side
> >>>>    is going to end as up rcu_read_lock so there soon will be no deadlocks with
> >>>>    rxe_pool_get_index() possible. xarrays were designed to work well with rcu
> >>>>    so it would better to just go ahead and do it. Verbs objects tend to be long
> >>>>    lived with lots of IO on each instance. This is a perfect use case for rcu.
> >>>
> >>> Yes
> >>>
> >>>> I think this means there is no reason for anything but a plain spinlock in rxe_alloc
> >>>> and rxe_add_to_pool.
> >>>
> >>> Maybe, are you sure there are no other xa spinlocks held from an IRQ?
> >>>
> >>> And you still have to deal with the create AH called in an atomic
> >>> region.
> >>
> >> There are only 3 references to the xarrays:
> >>
> >>         1. When an object is allocated. Either from rxe_alloc() which is called
> >>            an MR is registered or from rxe_add_to_pool() when the other
> >>            objects are created.
> >>         2. When an object is looked up from rxe_pool_get_index()
> >>         3. When an object is cleaned up from rxe_xxx_destroy() and similar.
> >>
> >> For non AH objects the create and destroy verbs are always called in process
> >> context and non-atomic and the lookup routine is normally called in soft IRQ
> >> context but doesn't take a lock when rcu is used so can't deadlock.
> >
> > Are you sure about this? There are about several non AH objects.
> > You can make sure that all in process context?
> >
> > And you can ensure it in the future?
>
> I added the line
>
>         WARN_ON(!in_task());
>
> to rxe_alloc and rxe_add_to_pool
>
> and it never triggered. I would be theoretically possible for someone to try

Your test environment does not mean that all the test environments
will not trigger this.

> to write a module that responds to an interrupt in some wigit and then
> tries to start a verbs session. But it would be very strange. It is reasonable
> to just declare that the verbs APIs are not callable in interrupt (soft or hard)

This will limit verbs APIs use.

> context. I believe this is tacitly understood and currently is true. It is a
> separate issue whether or not the caller is in 'atomic context' which includes
> holding a spinlock and implies that the thread cannot sleep. the xarray code
> if you look at the code for xa_alloc_cyclic() does take the xa_lock spinlock around
> _xa_alloc_cyclic() but they also say in the comments that they release that lock
> internally before calling kmalloc() and friends if the gfp_t parameter is
> GFP_KERNEL so that it is safe to sleep.  However cma.c calls ib_create_ah() while
> holding spinlocks (happens to be spin_lock_saveirq() but that doesn't matter here
> since any spinlock makes it bad to call kmalloc()). So we have to use GFP_ATOMIC
> for AH objects.
>
> It is clear that the assumption of the verbs APIs is that they are always called
> in process context.

How to ensure that they are not called in interrupt context? And some
kernel modules
also use the rdma.

> >
> >>
> >> For AH objects the create call is always called in process context but may
> >
> > How to ensure it?
>
> Same way. It is true now see the WARN_ON above.
> >
> >> or may not hold an irq spinlock so hard interrupts are disabled to prevent
> >> deadlocking CMs locks. The cleanup call is also in process context but also
> >> may or may not hold an irq spinlock (not sure if it happens). These calls
> >> can't deadlock each other for the xa_lock because there either won't be an
> >> interrupt or because the process context calls don't cause reentering the
> >> rxe code. They also can't deadlock with the lookup call when it is using rcu.
> >
> > From you, all the operations including create, destroy and lookup are
> > in process context or soft IRQ context.
> > How to ensure it? I mean that all operations are always in process or
> > soft IRQ context, and will not violate?
> > Even though in different calls ?
>
> We have no way to get into interrupt context. We get called from above through
> the verbs APIs in process context and from below by NAPI passing network packets
> to us in softirq context. We also internally defer work to tasklets which are
> always soft IRQs. Unless someone outside of the rdma-core subsystem called into
> verbs calls from interrupt context (weird) it can never happen. And has never
> happened. Again don't get confused with irqsave spinlocks. They are used in
> process or soft irq context sometimes and disable hardware interrupts to prevent
> a deadlock with another spinlock in a hardware interrupt handler. But we don't
> have any code that runs in hardware interrupt context to worry about. I doubt
> cma does either but many people use irqsave spinlocks when they are not needed.

How to verify that many people use unnecessary irqsave spinlock?

Zhu Yanjun

> >
> > Zhu Yanjun
> >
> >>
> >>>
> >>>> To sum up once we have rcu enabled the only required change is to use GFP_ATOMIC
> >>>> or find a way to pre-allocate for AH objects (assuming that I can convince myself
> >>>> that ib_create_ah really comes with spinlocks held).
> >>>
> >>> Possibly yes
> >>>
> >>> Jason
> >>
>



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux