Re: [PATCH 0/3] New approach at handling changed WWIDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2019-03-19 at 19:44 +0100, Martin Wilck wrote:
> On Tue, 2019-03-19 at 12:11 -0500, Benjamin Marzinski wrote:
> > On Mon, Mar 18, 2019 at 01:12:32PM +0100, Martin Wilck wrote:
> > > 
> > > Note also that if a "reconfigure" was carried out in the presence
> > > of
> > > paths with changed WWID, the final outcome would likely be the
> > > same
> > > that
> > > my patch now achieves without "reconfigure".
> > 
> > Yeah. I just checked, and this is very broken, and something needs
> > to
> > be
> > done to fix it.  If a device gets a change event while it's down,
> > it
> > will no longer have the udev properties necessary to not be
> > blacklisted,
> > so the device gets blacklisted, and ignored.  Even after it comes
> > back
> > up and gest another change event to restore these values,
> > multipathd
> > still ignores it, because the device was blacklisted during its add
> > event.
> 
> I wasn't aware of that.

We have a general problem in multipath-tools here. Our method of
blacklisting devices that don't have whitelisted udev properties
doesn't go together with the notion that udev may fail to set the
properties correctly, and the notion that paths shouldn't be removed or
failed (let alone blacklisted) without good reason.

Either we find a way to distinguish "devices that have incomplete udev
information because of temporary failure" and "devices that are missing
required udev properties permanently", or we must say goodbye to the
special treatment of blacklisting by property.

One obvious thing to do before blacklisting a path is to retry when we
encounter devices with missing properties. We can also check the
fallback UID methods - if they are successful and udev fails
repeatedly, the admin likely has messed up the udev rules.

Ben's approach to ignore WWIDs "changed to 0" at least temporarily
makes a lot of sense in this context. Paths that once used to have a
good WWID should be given up only after a reasonable number of retries.
Paths for which we'd never seen a valid WWID are treated by the
INIT_MISSING_UDEV logic.

Whatever we do, we should stop trying to "fix" the path WWID in
disassemble_map(). That's *so* against the separation of concerns
principle. In getuid(), we might check if a path with missing WWID is
already part of an existing multipath map, and then set the path WWID
from the map WWID as sort-of a last emergency fallback. But that, too,
should only be done during startup (assuming that a previous multipath
or multipathd instance had set up the map correctly, and that udev
information had been "lost" since then), and only after retrying as
described above.

Note that since by-property blacklisting was introduced in 2013,
significant progress has been made in other areas. We have blacklisting
by transport now, "find_multipaths", the "failed_wwids" logic that
avoids repeated attempts at setting up maps for busy devices, and the
INIT_MISSING_UDEV logic to deal with incomplete initialization. The
udev rules have been improved as well. So, doing away with "required
udev properties" may not be so dangerous, after all. 

Thoughts?
Martin

-- 
Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux