On 12/14/18 10:54 AM, Hannes Reinecke wrote:
On 12/14/18 10:06 AM, Thadeu Lima de Souza Cascardo wrote:
On Fri, Dec 14, 2018 at 06:56:06AM -0200, Thadeu Lima de Souza
Cascardo wrote:
On Fri, Dec 14, 2018 at 08:47:20AM +0100, Hannes Reinecke wrote:
But you haven't answered my question:
Why can't we patch 'lsblk' to provide the required information even
with the
current sysfs layout?
Just to be clear here. If with 'current sysfs layout' you mean without
any of
the patches we have been talking about, lsblk is not broken. It just
works with
nvme multipath enabled. It will show the multipath paths and simply
ignore the
underlying/hidden ones. If we hid them, we meant for them to be
hidden, right?
What I am trying to fix here is how to find out which PCI
device/driver is
needed to get to the block device holding the root filesystem, which
is what
initramfs needs. And the nvme multipath device is a virtual device,
pointing to
no driver at all, and no relation to its underlying devices, needed
for it to
work.
Well ...
But this is an entirely different proposition.
The 'slaves'/'holders' trick just allows to map the relationship between
_block_ devices, which arguably is a bit pointless here seeing that we
don't actually have block devices for the underlying devices.
But even if we _were_ implementing that you would still fail to get to
the PCI device providing the block devices as there is no link pointing
from one to another.
With the currently layout we have this hierarchy:
NVMe namespace (/dev/nvmeXn1Y) -> NVMe-subsys -> NVMe controller
and the NVMe controller is missing a link pointing to the device
presenting the controller:
# ls -l /sys/devices/virtual/nvme-fabrics/ctl/nvme2
total 0
-r--r--r-- 1 root root 4096 Dec 13 13:18 address
-r--r--r-- 1 root root 4096 Dec 13 13:18 cntlid
--w------- 1 root root 4096 Dec 13 13:18 delete_controller
-r--r--r-- 1 root root 4096 Dec 13 13:18 dev
lrwxrwxrwx 1 root root 0 Dec 13 13:18 device -> ../../ctl
-r--r--r-- 1 root root 4096 Dec 13 13:18 firmware_rev
-r--r--r-- 1 root root 4096 Dec 13 13:18 model
drwxr-xr-x 9 root root 0 Dec 3 13:55 nvme2c64n1
drwxr-xr-x 2 root root 0 Dec 13 13:18 power
--w------- 1 root root 4096 Dec 13 13:18 rescan_controller
--w------- 1 root root 4096 Dec 13 13:18 reset_controller
-r--r--r-- 1 root root 4096 Dec 13 13:18 serial
-r--r--r-- 1 root root 4096 Dec 13 13:18 state
-r--r--r-- 1 root root 4096 Dec 13 13:18 subsysnqn
lrwxrwxrwx 1 root root 0 Dec 3 13:44 subsystem ->
../../../../../class/nvme
-r--r--r-- 1 root root 4096 Dec 13 13:18 transport
-rw-r--r-- 1 root root 4096 Dec 13 13:18 uevent
So what we need to do is to update the 'device' link to point to the PCI
device providing the controller.
(Actually, we would need to point the 'device' link to point to the
entity providing the transport address, but I guess we don't have that
for now.)
And _that's_ what we need to fix; the slaves/holders stuff doesn't solve
the underlying problem, and really shouldn't be merged at all.
Mind you, it _does_ work for PCI-NVMe:
# ls -l /sys/class/nvme/nvme0
total 0
-r--r--r-- 1 root root 4096 Dec 14 11:14 cntlid
-r--r--r-- 1 root root 4096 Dec 14 11:14 dev
lrwxrwxrwx 1 root root 0 Dec 14 11:14 device -> ../../../0000:45:00.0
-r--r--r-- 1 root root 4096 Dec 14 11:14 firmware_rev
-r--r--r-- 1 root root 4096 Dec 14 11:14 model
drwxr-xr-x 12 root root 0 Dec 3 13:43 nvme1n1
drwxr-xr-x 2 root root 0 Dec 14 11:14 power
--w------- 1 root root 4096 Dec 14 11:14 rescan_controller
--w------- 1 root root 4096 Dec 14 11:14 reset_controller
-r--r--r-- 1 root root 4096 Dec 14 11:14 serial
-r--r--r-- 1 root root 4096 Dec 14 11:14 state
-r--r--r-- 1 root root 4096 Dec 14 11:14 subsysnqn
lrwxrwxrwx 1 root root 0 Dec 3 13:43 subsystem ->
../../../../../../class/nvme
-r--r--r-- 1 root root 4096 Dec 14 11:14 transport
-rw-r--r-- 1 root root 4096 Dec 14 11:14 uevent
So it might be as simple as this patch:
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index feb86b59170e..1ecdec6b8b4a 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3117,7 +3117,7 @@ nvme_fc_init_ctrl(struct device *dev, struct
nvmf_ctrl_options *opts,
* Defer this to the connect path.
*/
- ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_fc_ctrl_ops, 0);
+ ret = nvme_init_ctrl(&ctrl->ctrl, ctrl->dev, &nvme_fc_ctrl_ops, 0);
if (ret)
goto out_cleanup_admin_q;
As for RDMA / TCP we're running on a network address which really isn't
tied to a specific device, so we wouldn't have any device to hook on
without some trickery.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@xxxxxxx +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)