Re: [PATCH 2/7] IB/rxe: Disable completion upcalls when a CQ is destroyed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/27/17, 5:35 AM, "monisonlists@xxxxxxxxx on behalf of Moni Shoua"
<monisonlists@xxxxxxxxx on behalf of monis@xxxxxxxxxxxx> wrote:

>On Tue, Jul 25, 2017 at 4:39 PM, Andrew Boyer <andrew.boyer@xxxxxxxx>
>wrote:
>> This prevents the stack from accessing userspace objects while they
>> are being torn down.
>>
>> Fixes: 8700e3e7c485 ("Soft RoCE driver")
>> Signed-off-by: Andrew Boyer <andrew.boyer@xxxxxxxx>
>> ---
>>  drivers/infiniband/sw/rxe/rxe_cq.c    | 19 +++++++++++++++++++
>>  drivers/infiniband/sw/rxe/rxe_loc.h   |  2 ++
>>  drivers/infiniband/sw/rxe/rxe_verbs.c |  2 ++
>>  drivers/infiniband/sw/rxe/rxe_verbs.h |  1 +
>>  4 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c
>>b/drivers/infiniband/sw/rxe/rxe_cq.c
>> index 49fe42c..c4aabf7 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_cq.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_cq.c
>> @@ -69,6 +69,14 @@ int rxe_cq_chk_attr(struct rxe_dev *rxe, struct
>>rxe_cq *cq,
>>  static void rxe_send_complete(unsigned long data)
>>  {
>>         struct rxe_cq *cq = (struct rxe_cq *)data;
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&cq->cq_lock, flags);
>> +       if (cq->is_dying) {
>> +               spin_unlock_irqrestore(&cq->cq_lock, flags);
>> +               return;
>> +       }
>> +       spin_unlock_irqrestore(&cq->cq_lock, flags);
>What if CQ is destroyed here after you pass the is_dying test?
>Maybe you should think of a solution based on ref counting.
>>         cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
>>  }

Hello Moni,
Thank you for all of the reviews. I¹ll address commit messages etc. in a
revised series.

This is the situation that causes a crash here:
 - Userspace programs exits
 - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(),
ib_destroy_cq(), etc. and releasing/freeing the UCQ
   - The QP still has tasklets running, so it isn¹t destroyed yet
   - The CQ is referenced (twice) by the QP, so the CQ isn¹t destroyed yet
   - The UCQ is kfree()'d!
 - A send work request completes
 - rxe_send_complete() calls cq->ibcq.comp_handler()
 - ib_uverbs_comp_handler() runs and crashes; the event queue is checked
for is_closed, but it has no way to check the ib_ucq_object

As you can see, the reference counting on the CQ doesn¹t protect us.
There¹s no interface I could find that would deregister the UCQ from the
CQ. I didn¹t think attempting to add reference counting to the UCQ was
going to be a good way to go since the solution I posted above is so much
simpler (if hacky).

It looks like ib_uverbs_cleanup_context() is gone in 4.12. I don¹t know if
whatever replaced it addresses this issue already, by accident or by
design.

Does this make sense? Do you have a better idea for a fix?

Thank you,
Andrew

P.S. Sorry for the Outlook garbage formatting.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux