Re: mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed during stress test on reset_controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi Sagi
With this path, the OOM cannot be reproduced now.

But there is another problem, the reset operation[1] failed at iteration 1007.
[1]
echo 1 >/sys/block/nvme0n1/device/reset_controller

We can relax this a bit by only flushing for admin queue accepts, and
also let the host accept longer time for establishing a connection.

Does this help?
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 47a479f26e5d..e1db1736823f 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -34,7 +34,7 @@
 #include "fabrics.h"


-#define NVME_RDMA_CONNECT_TIMEOUT_MS   1000            /* 1 second */
+#define NVME_RDMA_CONNECT_TIMEOUT_MS   5000            /* 5 seconds */

#define NVME_RDMA_MAX_SEGMENT_SIZE 0xffffff /* 24-bit SGL field */

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index ecc4fe862561..88bb5814c264 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1199,6 +1199,11 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
        }
        queue->port = cm_id->context;

+       if (queue->host_qid == 0) {
+               /* Let inflight controller teardown complete */
+               flush_scheduled_work();
+       }
+
        ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn);
        if (ret)
                goto release_queue;
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux