On 3/20/2022 3:03 PM, Sagi Grimberg wrote:
These are very long times for a non-debug kernel...
Max, do you see the root cause for this?
Yi, does this happen with rxe/siw as well?
Hi Sagi
rxe/siw will take less than 1s
with rdma_rxe
# time nvme reset /dev/nvme0
real 0m0.094s
user 0m0.000s
sys 0m0.006s
with siw
# time nvme reset /dev/nvme0
real 0m0.097s
user 0m0.000s
sys 0m0.006s
This is only reproducible with mlx IB card, as I mentioned before, the
reset operation time changed from 3s to 12s after the below commit,
could you check this commit?
commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc
Author: Max Gurtovoy <maxg@xxxxxxxxxxxx>
Date: Tue May 19 17:05:56 2020 +0300
nvme-rdma: add metadata/T10-PI support
I couldn't repro these long reset times.
It appears to be when setting up a controller with lots of queues
maybe?
I'm doing that.
Nevertheless, the above commit added T10-PI offloads.
In this commit, for supported devices we create extra resources in HW
(more memory keys per task).
I suggested doing this configuration as part of the "nvme connect"
command and save this resource allocation by default but during the
review I was asked to make it the default behavior.
Don't know if I gave you this feedback or not, but it probably didn't
occur to the commenter that it will make the connection establishment
take tens of seconds.
It was known that we create more resources than needed for
"non-PI-desired" controllers.
Sagi/Christoph,
WDYT ? should we reconsider the "nvme connect --with_metadata" option ?
Maybe you can make these lazily allocated?
You mean something like:
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fd4720d37cc0..367ba0bb62ab 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1620,10 +1620,19 @@ int nvme_getgeo(struct block_device *bdev,
struct hd_geometry *geo)
}
#ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
- u32 max_integrity_segments)
+static int nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns)
{
struct blk_integrity integrity = { };
+ u16 ms = ns->ms;
+ u8 pi_type = ns->pi_type;
+ u32 max_integrity_segments = ns->ctrl->max_integrity_segments;
+ int ret;
+
+ if (ns->ctrl->ops->init_integrity) {
+ ret = ns->ctrl->ops->init_integrity(ns->ctrl);
+ if (ret)
+ return ret;
+ }
switch (pi_type) {
case NVME_NS_DPS_PI_TYPE3:
@@ -1644,11 +1653,13 @@ static void nvme_init_integrity(struct gendisk
*disk, u16 ms, u8 pi_type,
integrity.tuple_size = ms;
blk_integrity_register(disk, &integrity);
blk_queue_max_integrity_segments(disk->queue,
max_integrity_segments);
+
+ return 0;
}
#else
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
- u32 max_integrity_segments)
+static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns)
{
+ return 0;
}
#endif /* CONFIG_BLK_DEV_INTEGRITY */
@@ -1853,8 +1864,8 @@ static void nvme_update_disk_info(struct gendisk
*disk,
if (ns->ms) {
if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
(ns->features & NVME_NS_METADATA_SUPPORTED))
- nvme_init_integrity(disk, ns->ms, ns->pi_type,
- ns->ctrl->max_integrity_segments);
+ if (nvme_init_integrity(disk, ns))
+ capacity = 0;
else if (!nvme_ns_has_pi(ns))
capacity = 0;
}
@@ -4395,7 +4406,7 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl);
and create the resources for the first namespace we find as PI formatted ?