We actually missed a kref_get in nvme_get_ns_from_disk().
This should fix it. Could you help to verify?
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4babdf0..b146f52 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -183,6 +183,8 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct
gendisk *disk)
}
spin_unlock(&dev_list_lock);
+ kref_get(&ns->ctrl->kref);
+
return ns;
fail_put_ns:
Hey Ming. This avoids the crash in nvme_rdma_free_qe(), but now I see another crash:
[ 975.633436] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420
[ 978.463636] nvme nvme0: creating 32 I/O queues.
[ 979.187826] nvme nvme0: new ctrl: NQN "testnqn", addr 10.0.1.14:4420
[ 987.778287] nvme nvme0: Got rdma device removal event, deleting ctrl
[ 987.882202] BUG: unable to handle kernel paging request at ffff880e770e01f8
[ 987.890024] IP: [<ffffffffa03a1a46>] __ib_process_cq+0x46/0xc0 [ib_core]
This looks like another problem with freeing the tag sets before stopping the QP. I thought we fixed that once and for all, but perhaps there is some other path we missed. :(
The fix doesn't look right to me. But I wander how you got this crash
now? if at all, this would delay the controller removal...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html