From: Neerav Parikh <Neerav.Parikh@xxxxxxxxx> When a VLAN device is removed via command line "vconfig rem <vlan-dev>"; the network layer will send out NETDEV_UNREGISTER notification to all the devices that are on top of the VLAN device and listening to that notification. After that the network layer will keep the device reference till all the holds are removed but it will go ahead and remove the sysfs entries of the VLAN device after sending out the notification. In case of an FCoE interface configured on top of a VLAN device; when the VLAN NETDEV_UNREGISTER is called it queues up the destroying of the interface in a delayed workqueue. Now, when SCSI disks that are discovered via FCoE interface are participating in a multipath environment the removal of VLAN devices results in multipathd not able to remove the individual paths from it's internal table resulting in dangling sysfs links and references. Multipathd is listening to uevents generated by the kernel and removal of the VLAN device and the tree below it is received by the uevent listener. In case of the the 'remove' uevent from the kernel for SCSI disks the DEVPATH for the SCSI disk ends at the VLAN interface name; resulting in the multipathd not able to find the device in sysfs and hence not removing it from it's internal table. Here's an excerpt from the syslog with multipath debug enabled for the sequence of 'remove' uevent to understand what's received: [snip] Apr 2 13:22:23 linaut71 multipathd: uevent 'remove' from '/eth2.228-fcoe/ctlr_4/host6/rport-6:0-2/target6:0:0/6:0:0:0/block/sdb' Apr 2 13:22:23 linaut71 multipathd: UDEV_LOG=3 Apr 2 13:22:23 linaut71 multipathd: ACTION=remove Apr 2 13:22:23 linaut71 multipathd: DEVPATH=/eth2.228-fcoe/ctlr_4/host6/rport-6:0-2/target6:0:0/6:0:0:0/block/sdb Apr 2 13:22:23 linaut71 multipathd: SUBSYSTEM=block Apr 2 13:22:23 linaut71 multipathd: DEVNAME=/dev/sdb Apr 2 13:22:23 linaut71 multipathd: DEVTYPE=disk Apr 2 13:22:23 linaut71 multipathd: SEQNUM=2040 Apr 2 13:22:23 linaut71 multipathd: MAJOR=259 Apr 2 13:22:23 linaut71 multipathd: MINOR=589824 Apr 2 13:22:23 linaut71 multipathd: ID_SCSI=1 Apr 2 13:22:23 linaut71 multipathd: ID_VENDOR=EMC Apr 2 13:22:23 linaut71 multipathd: ID_VENDOR_ENC=EMC\x20\x20\x20\x20\x20 Apr 2 13:22:23 linaut71 multipathd: ID_MODEL=SYMMETRIX Apr 2 13:22:23 linaut71 multipathd: ID_MODEL_ENC=SYMMETRIX\x20\x20\x20\x20\x20\x20\x20 Apr 2 13:22:23 linaut71 multipathd: ID_REVISION=5874 Apr 2 13:22:23 linaut71 multipathd: ID_TYPE=disk Apr 2 13:22:23 linaut71 multipathd: ID_SERIAL=360000970000194900586533030303243 Apr 2 13:22:24 linaut71 multipathd: ID_SERIAL_SHORT=60000970000194900586533030303243 Apr 2 13:22:24 linaut71 multipathd: ID_WWN=0x6000097000019490 Apr 2 13:22:24 linaut71 multipathd: ID_WWN_VENDOR_EXTENSION=0x0586533030303243 Apr 2 13:22:24 linaut71 multipathd: ID_WWN_WITH_EXTENSION=0x60000970000194900586533030303243 Apr 2 13:22:24 linaut71 multipathd: ID_SCSI_SERIAL=90058602C000 Apr 2 13:22:24 linaut71 multipathd: ID_BUS=scsi Apr 2 13:22:24 linaut71 multipathd: ID_PATH=fc-0x50000972c0092919-lun-0 Apr 2 13:22:24 linaut71 multipathd: DEVLINKS=/dev/block/259:589824 /dev/disk/by-id/scsi-360000970000194900586533030303243 /dev/disk/by-path/fc-0x50000972c0092919-l Apr 2 13:22:24 linaut71 multipathd: /eth2.228-fcoe/ctlr_4/host6/rport-6:0-2/target6:0:0/6:0:0:0/block/sdb: not found in sysfs Apr 2 13:22:24 linaut71 multipathd: uevent trigger error [snip] With the below patch that I added to fix this issue; on failure of sysfs_device_get() in fetching the device from sysfs in uev_remove_path() instead of bailing out the code will continue and search for the device itself (leaf node in devpath) in the internal multipathd table. If it is found then it will continue with the removal of the path. Signed-off-by: Neerav Parikh <Neerav.Parikh@xxxxxxxxx> --- multipathd/main.c | 14 ++++++++++++-- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/multipathd/main.c b/multipathd/main.c index 5b7195d..74fa8fc 100644 --- a/multipathd/main.c +++ b/multipathd/main.c @@ -553,12 +553,22 @@ uev_remove_path (struct uevent *uev, struct vectors * vecs) dev = sysfs_device_get(uev->devpath); if (!dev) { condlog(2, "%s: not found in sysfs", uev->devpath); - return 1; + /* + * Seems like we got uevent for a device that does not have + * a valid devpath anymore. + * Check if the device itself is actually present or not. + */ + condlog(2, "%s: searching path in pathvec", uev->kernel); + if (!find_path_by_dev(vecs->pathvec, uev->kernel)) { + condlog(2, "%s: path not found in pathvec", + uev->kernel); + return 1; + } } condlog(2, "%s: remove path (uevent)", uev->kernel); retval = ev_remove_path(uev->kernel, vecs); - if (!retval) + if (!retval && dev) sysfs_device_put(dev); return retval; -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel