On Sun, May 26, 2024 at 11:31:25AM +0300, Sagi Grimberg wrote: > ib_drain_qp does not do drain a shared recv queue (because it is > shared). However in the absence of any guarantees that the recv > completions were consumed, the ulp can reference these completions > after draining the qp and freeing its associated resources, which > is a uaf [1]. > > We cannot drain a srq like a normal rq, however in ib_drain_qp > once the qp moved to error state, we reap the recv_cq once in > order to prevent consumption of recv completions after the drain. > > [1]: > -- > [199856.569999] Unable to handle kernel paging request at virtual address 002248778adfd6d0 > <....> > [199856.721701] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] > <....> > [199856.827281] Call trace: > [199856.829847] nvmet_parse_admin_cmd+0x34/0x178 [nvmet] > [199856.835007] nvmet_req_init+0x2e0/0x378 [nvmet] > [199856.839640] nvmet_rdma_handle_command+0xa4/0x2e8 [nvmet_rdma] > [199856.845575] nvmet_rdma_recv_done+0xcc/0x240 [nvmet_rdma] > [199856.851109] __ib_process_cq+0x84/0xf0 [ib_core] > [199856.855858] ib_cq_poll_work+0x34/0xa0 [ib_core] > [199856.860587] process_one_work+0x1ec/0x4a0 > [199856.864694] worker_thread+0x48/0x490 > [199856.868453] kthread+0x158/0x160 > [199856.871779] ret_from_fork+0x10/0x18 > -- > > Signed-off-by: Sagi Grimberg <sagi@xxxxxxxxxxx> > --- > Note this patch is not yet tested, but sending it for visibility and > early feedback. While nothing prevents ib_drain_cq to process a cq > directly (even if it has another context) I am not convinced if all > the upper layers don't have any assumptions about a single context > consuming the cq, even if it is while it is drained. It is also > possible to to add ib_reap_cq that fences the cq poll context before > reaping the cq, but this may have other side-effects. Did you have a chance to test this patch? I looked at the code and it seems to be correct change, but I also don't know about all ULP assumptions. Thanks > > This crash was seen in the wild, and not easy to reproduce. I suspect > that moving the nvmet_wq to be unbound expedited the teardown process > exposing a possible uaf for srq attached qps (which nvmet-rdma has a > mode of using). > > > drivers/infiniband/core/verbs.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c > index 94a7f3b0c71c..580e9019e96a 100644 > --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -2962,6 +2962,17 @@ void ib_drain_qp(struct ib_qp *qp) > ib_drain_sq(qp); > if (!qp->srq) > ib_drain_rq(qp); > + else { > + /* > + * We cannot drain a srq, however the qp is in error state, > + * and will not generate new recv completions, hence it should > + * be enough to reap the recv cq to cleanup any recv completions > + * that may have placed before we drained. Without this nothing > + * guarantees that the ulp will free resources and only then > + * consume the recv completion. > + */ > + ib_process_cq_direct(qp->recv_cq, -1); > + } > } > EXPORT_SYMBOL(ib_drain_qp); > > -- > 2.40.1 > >