Hi,
Panic after connection with below commits, detailed log here:
https://pastebin.com/7z0XSGSd
31fdf18 nvme-rdma: reuse configure/destroy_admin_queue
3f02fff nvme-rdma: don't free tagset on resets
18398af nvme-rdma: disable the controller on resets
b28a308 nvme-rdma: move tagset allocation to a dedicated routine
good 34b6c23 nvme: Add admin_tagset pointer to nvme_ctrl
Is that a reproducible panic? I'm not seeing this at all.
Yes, I can reproduce every time. And the target side kernel version is
4.14.0-rc1 during the panic occurred.
Can you run gdb on nvme-rdma.ko
$ l *(nvme_rdma_create_ctrl+0x37d)
[root@rdma-virt-01 linux ((31fdf18...))]$ gdb
/usr/lib/modules/4.13.0-rc7.31fdf18+/kernel/drivers/nvme/host/nvme-rdma.ko
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from
/usr/lib/modules/4.13.0-rc7.31fdf18+/kernel/drivers/nvme/host/nvme-rdma.ko...done.
(gdb) l *(nvme_rdma_create_ctrl+0x37d)
0x297d is in nvme_rdma_create_ctrl (drivers/nvme/host/rdma.c:656).
651 struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
652 struct blk_mq_tag_set *set = admin ?
653 &ctrl->admin_tag_set : &ctrl->tag_set;
654
655 blk_mq_free_tag_set(set);
656 nvme_rdma_dev_put(ctrl->device);
657 }
658
659 static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct
nvme_ctrl *nctrl,
660 bool admin)
(gdb)
Lets take this one step at a time, starting with this issue.
First, there is a reason why a simple create_ctrl fails, can we isolate
exactly which call fails? Was something else going on that might have
made the simple create_ctrl fail?
We don't see any "rdma_resolve_addr failed" or "failed to connect queue"
messages but we do see "creating I/O queues" which means that we either
failed at IO tagset allocation or initializing connect_q.
We have a missing error code assignment so can you try the following patch:
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 58983000964b..98dd51e630bd 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -765,8 +765,10 @@ static int nvme_rdma_configure_admin_queue(struct
nvme_rdma_ctrl *ctrl,
if (new) {
ctrl->ctrl.admin_tagset =
nvme_rdma_alloc_tagset(&ctrl->ctrl, true);
- if (IS_ERR(ctrl->ctrl.admin_tagset))
+ if (IS_ERR(ctrl->ctrl.admin_tagset)) {
+ error = PTR_ERR(ctrl->ctrl.admin_tagset);
goto out_free_queue;
+ }
ctrl->ctrl.admin_q =
blk_mq_init_queue(&ctrl->admin_tag_set);
if (IS_ERR(ctrl->ctrl.admin_q)) {
@@ -846,8 +848,10 @@ static int nvme_rdma_configure_io_queues(struct
nvme_rdma_ctrl *ctrl, bool new)
if (new) {
ctrl->ctrl.tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl,
false);
- if (IS_ERR(ctrl->ctrl.tagset))
+ if (IS_ERR(ctrl->ctrl.tagset)) {
+ ret = PTR_ERR(ctrl->ctrl.tagset);
goto out_free_io_queues;
+ }
ctrl->ctrl.connect_q = blk_mq_init_queue(&ctrl->tag_set);
if (IS_ERR(ctrl->ctrl.connect_q)) {
--
Also, can you add the following debug messages to find out what failed?
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 58983000964b..e46475100eea 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -676,6 +676,12 @@ static void nvme_rdma_free_tagset(struct nvme_ctrl
*nctrl, bool admin)
struct blk_mq_tag_set *set = admin ?
&ctrl->admin_tag_set : &ctrl->tag_set;
+ if (set == &ctrl->tag_set) {
+ pr_err("%s: freeing IO tagset\n", __func__);
+ } else {
+ pr_err("%s: freeing ADMIN tagset\n", __func__);
+ }
+
blk_mq_free_tag_set(set);
nvme_rdma_dev_put(ctrl->device);
}
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html