Hi Ben, I recently encountered a strange and disturbing phenomenon with the latest upstream code. The system in question had two iSCSI connections on separate networks. On one of them, I tested iSCSI disconnect and re-connects, and broken storage configurations (different LUNs diffenent sizes and with the same WWID). The other one was idle. During one test, I observed that multipathd's internal representation of the map representing the *idle* connection had vanished. The map was still present on the device-mapper level, but multipathd seemed to have "forgotten" it completely. The respective paths were tracked as orphans. Here is a log excerpt. multipathd is working on path additions for the map I was testing with (149455...), when a DM event is received. Right after ev_add_path() releases the lock, the idle map (36001405...) is removed (not flushed, as the messages suggest). Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: add path (uevent) # uev_add_path->pathinfo() Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg: udev property SCSI_IDENT_LUN_T10 whitelisted # ev_add_path()->adopt_paths() Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sde: ownership set to 149455400000000006d706174683100000000000000000000 ... # ev_add_path->setup_map() Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 149455400000000006d706174683100000000000000000000: failback = "manual" (setting: multipath internal) # ev_add_path->domap() Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 149455400000000006d706174683100000000000000000000: load table [0 2097152 multipath 0 0 3 1 service-time 0 1 1 8:64 ... # ev_add_path->setup_multipath()->update-multipath_strings Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 149455400000000006d706174683100000000000000000000: disassemble map [0 0 3 1 service-time 0 1 2 8:64 1 1 service-time ... # (!!!!) devmap event for LIO map Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: 36001405d08f748dd4c5481f9ec22b888: devmap event #20 Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdg [8:96]: path added to devmap 149455400000000006d706174683100000000000000000000 # ev_add_path() returns, vecs->lock is released # uev_add_path() goes on to next path, lock currently not held Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdh: add path (uevent) # (!!!!) The LIO map is flushed (36001405d08f748dd4c5481f9ec22b888) # this looks like remove_map() or remove_map_and_stop_waiter() > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdd: orphan path, map flushed > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sysfs prioritizer refcount 2 > Nov 15 22:05:46 sles15-gm-iscsi-03 multipathd[568]: sdc: orphan path, map flushed To me this looks like the map was removed by the DM event handler. More detail can be found on https://bugzilla.opensuse.org/show_bug.cgi?id=1116272. I've also tried to reproduce the problem, with no success so far (actually I haven't observed any dmevents in those reproduction attempts). After encountering this problem, I've scrutinized your polling dmevent handler code over and over last week, but I couldn't spot any possible problems :-) The concern that I have is with the design of the polling dmevent API, in particular the fact that the default action is EVENT_REMOVE. If, for whatever reason, one map is missing from the return value of the DM_DEVICE_LIST ioctl, multipath removes the map immediately, and there's nothing short of a "reconfigure" or "add map" CLI command that would reinstate the map. IOW, we remove the map not on a kernel event saying "this map has been removed", but on receiving a list where this element happens to be missing. We handle this differently for paths, where we wait for a "remove" uevent before we really delete the path from our data structures. (Note that the messages file I saved from the event above shows no sign of such an uevent ever beeing sent - as I said, the dm map was still present after the above occured). What do you think about this? Could you maybe inspect those logs I took, to make sure I didn't get on a totally wrong track here? Regards, Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel