Hi,
While testing Jens' for-next branch I encountered a use-after-free
issue, triggered by test nvmeof-mp/002. This is not the first time I see
this issue - I had already observed this several weeks ago but I had not
yet had the time to report this.
That is surprising because this area did not change for quite a while
now.
CCing linux-rdma as well, I'm assuming that this is with rxe?
Does this happen with siw as well?
Hi Sagi,
This happened with the siw driver. I haven't tried the rxe driver for a
while.
The crash addresses correspond to the following source file and line:
(gdb) list *(__ib_process_cq+0x11c)
0x7f7c is in __ib_process_cq (drivers/infiniband/core/cq.c:110).
105 budget - completed),
wcs)) > 0) {
106 for (i = 0; i < n; i++) {
107 struct ib_wc *wc = &wcs[i];
108
109 if (wc->wr_cqe)
110 wc->wr_cqe->done(cq, wc);
111 else
112 WARN_ON_ONCE(wc->status ==
IB_WC_SUCCESS);
113 }
114
(gdb) list *(nvme_rdma_create_queue_ib+0x1a7)
0x3d47 is in nvme_rdma_create_queue_ib (drivers/nvme/host/rdma.c:219).
214 {
215 struct nvme_rdma_qe *ring;
216 int i;
217
218 ring = kcalloc(ib_queue_size, sizeof(struct
nvme_rdma_qe), GFP_KERNEL);
219 if (!ring)
220 return NULL;
221
222 /*
223 * Bind the CQEs (post recv buffers) DMA mapping to the
RDMA queue
(gdb) list *(nvme_rdma_destroy_queue_ib+0x1b8)
0x2388 is in nvme_rdma_destroy_queue_ib (drivers/nvme/host/rdma.c:358).
353 kfree(ndev);
354 }
355
356 static void nvme_rdma_dev_put(struct nvme_rdma_device *dev)
357 {
358 kref_put(&dev->ref, nvme_rdma_free_dev);
359 }
360
361 static int nvme_rdma_dev_get(struct nvme_rdma_device *dev)
362 {
Shouldn't ib_drain_qp() be called before nvme_rdma_destroy_queue_ib()
destroys the QP?
Yes it absolutely should, and it is according to the code.
The only way that this can happen is something happens to
post a wr after the drain started, can't see how this happens though...