On Tue, 2019-03-19 at 19:44 +0100, Martin Wilck wrote: > On Tue, 2019-03-19 at 12:11 -0500, Benjamin Marzinski wrote: > > On Mon, Mar 18, 2019 at 01:12:32PM +0100, Martin Wilck wrote: > > > > > > Note also that if a "reconfigure" was carried out in the presence > > > of > > > paths with changed WWID, the final outcome would likely be the > > > same > > > that > > > my patch now achieves without "reconfigure". > > > > Yeah. I just checked, and this is very broken, and something needs > > to > > be > > done to fix it. If a device gets a change event while it's down, > > it > > will no longer have the udev properties necessary to not be > > blacklisted, > > so the device gets blacklisted, and ignored. Even after it comes > > back > > up and gest another change event to restore these values, > > multipathd > > still ignores it, because the device was blacklisted during its add > > event. > > I wasn't aware of that. We have a general problem in multipath-tools here. Our method of blacklisting devices that don't have whitelisted udev properties doesn't go together with the notion that udev may fail to set the properties correctly, and the notion that paths shouldn't be removed or failed (let alone blacklisted) without good reason. Either we find a way to distinguish "devices that have incomplete udev information because of temporary failure" and "devices that are missing required udev properties permanently", or we must say goodbye to the special treatment of blacklisting by property. One obvious thing to do before blacklisting a path is to retry when we encounter devices with missing properties. We can also check the fallback UID methods - if they are successful and udev fails repeatedly, the admin likely has messed up the udev rules. Ben's approach to ignore WWIDs "changed to 0" at least temporarily makes a lot of sense in this context. Paths that once used to have a good WWID should be given up only after a reasonable number of retries. Paths for which we'd never seen a valid WWID are treated by the INIT_MISSING_UDEV logic. Whatever we do, we should stop trying to "fix" the path WWID in disassemble_map(). That's *so* against the separation of concerns principle. In getuid(), we might check if a path with missing WWID is already part of an existing multipath map, and then set the path WWID from the map WWID as sort-of a last emergency fallback. But that, too, should only be done during startup (assuming that a previous multipath or multipathd instance had set up the map correctly, and that udev information had been "lost" since then), and only after retrying as described above. Note that since by-property blacklisting was introduced in 2013, significant progress has been made in other areas. We have blacklisting by transport now, "find_multipaths", the "failed_wwids" logic that avoids repeated attempts at setting up maps for busy devices, and the INIT_MISSING_UDEV logic to deal with incomplete initialization. The udev rules have been improved as well. So, doing away with "required udev properties" may not be so dangerous, after all. Thoughts? Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel