Multipath path classification revisited

Martin Wilck <mwilck@xxxxxxxx> · Wed, 17 Jan 2018 17:27:07 +0100

Here's an attempt to write down the issue from ground up. Let me know
if I've missed, or if you disagree with, anything in this document.

**TL;DR:** Please scroll down to the "Recommendations" section.

# The goal

The goal is to make good decisions whether a given path is part of a multipath
map, and make multipath setup "just work". This implies:

 * (Mandatory) multipathing must not harm system stability.

   - Entering emergency mode because a wrong multipath classification must be
     avoided. 
   - Multipath activation shouldn't cause devices or filesystems to
     be undetected, even if they're not required for booting (unless these file
     systems are marked "nofail", emergency mode will be entered anyway).

 * (Important) Known devices that are reachable via multiple paths should be
    detected and set up correctly under multipath. It should be avoided that only
    a single path is used for such devices.

 * (Nice-to-have) Newly added devices should classified correctly.

# Blacklisting

The historical approach to the problem has been blacklisting. Users are
supposed to set the list of paths to be multipathed using blacklist and
blacklist exceptions. This works well if done properly.

Unfortunately, getting the blacklist right is not so easy, in particular if it
has to be done on many hosts, and thus I'll restrict the discussion from now
on to a setup without explicit blacklisting by the user. Furthermore, I'll
consider only setups using systemd.

# Critical points in the code flow

There are four places where paths are considered for multipathing:

 - `multipath -u` call in udev rules in initramfs,
 - multipathd in initramfs,
 - `multipath -u` call after switching root,
 - multipathd after switching root.

# Avoiding errors

It's simple: in order to avoid boot errors of the "mandatory"
category, we must make sure that the results for all four points above are the
same, for all paths to a given device. If the classifications differ, various
kinds of problems may arise, from hard-to-even-notice to fatal.

# Agreement between initramfs and booted system

This is also quite simple:

   **Ensure identical configuration between root fs and initrd.**

`multipath.conf`, `config_dir` contents, `wwids` file, udev rules, and command
line parameters have to be equal between initramfs and root FS. Moreover, all
relevant kernel modules 
need to be available and loaded early in initramfs (before uevent processing),
to avoid errors caused by missing drivers. Also, multipathd service/socket must
be enabled both in root and initrd.

Unfortunately, that __puts the burden on the user__. He must recreate the initrd
whenever any of the above changes. We have no means to enforce that. One might
consider making the multipath configuration files read-only and creating a
tool such as `visudo` that would recreate the initrd after every change, but
that would be a future project and might not be appreciated by users.

The above needs to be taken with a grain of salt, obviously only few config
parameters and command line options have an effect on path classification:

 - blacklist and blacklist exceptions
 - `find_multipaths` 
 - options related to WWID detection, `uid_attrs` etc.
 - `-i` option to multipath (`ignore_wwids`)
 - `-n` option to multipathd (`ignore_new_devs`)
 - `wwids`

## non-multipathed root

An exception to the rule in the previous section is the use case where only
data partitions (no disks required to boot the root FS) are multipathed. In
this case it's sufficient to make sure that multipathing is off during initrd
processing, and that, after switching root, the root device isn't falsely
classied as multipath member. The latter can be achieved in various ways:

 - blacklisting
 - find_multipaths
 - not using "ignore_wwids" in udev rules

If either of these is used, it actually doesn't matter whether multipath is
kept out of the initrd or the "equal configuration" rule is followed.

# Agreement between "multipath -u" and multipathd

This is where it gets tricky, because configuration and timing matter.
multipath and multipathd share most of the configuration, so unless
the configuration is modified between the runs of the two executables, we can
focus on just a few parameters.

## find_multipaths=off

This case is quite simple:

**"`ignore_wwids`" should be used if and only if "ignore_new_devs" is not**

 1. `ignore_new_devs`=off and `ignore_wwids`=on: all paths will be treated as multipath
    devices by both multipathd and multipath -u.
 2. `ignore_new devs`=on and `ignore_wwids`=off: both multipath and multipathd will
    only consider paths with WWIDs in the wwids file.

Unfortunately, the current upstream default is `ignore_new_devs` off and
`ignore_wwids` off, which is almost certain to lead to trouble.

Option 1. is the current SUSE approach.

## find_multipaths=on

The simple case, again, is

 3. ignore_new devs=on and `ignore_wwids`=off: this behaves like 2.
    above. Users  must explicitly add WWIDs in order to have them multipathed.

If `ignore_new_devs`=off, multipathd will try to set up a map for a WWID if and
only if

 - a) it sees more than one path to the WWID, or
 - b) the WWID is referenced in the wwids file.

Setting up the map may fail if one or more paths have already been opened
otherwise (by FS mounts, LVM, MD, whatever), which can happen if the path was
classified as non-multipath before.

If `ignore_wwids`=on, multipath -u will classify a path as multipath member if
 and only if

 - c) it sees more than one path to the WWID, or
 - d) there is already a multipath map referencing the path.

"multipath -u" sees paths before multipathd during udev rule processing, so
d) matters only in the root FS after a map may have been set up
in initramfs already. Anyway, d) is an important difference to the behavior of
multipathd, because multipathd (currently, as of 0.7.4) has no such
logic. Vice versa, the logic of b) isn't followed by `multipath -u`.

If we insist that multipath  and multipathd come to the same conclusion about
a given path at in a given situation, it follows that only 3. above is valid.
This is what the past patches 64e27ec and ffbb886 enforce.

It's obvious that `ignore_new_devs` and `ignore_wwids` should neither both be
"on" nor "off". In both cases the applied logic would be just too different, agreement
would be by coincidence only.

### ignore_new_devs=off+ignore_wwids=on

Most of this can be fixed by adding case d) to the logic of multipathd, and b) to
the logic of multipath.

What remains is the question of paths being detected one at a time. If we fix
b), we can focus on the case where the WWID is not in the wwids file.

The first `multipath -u` invocation for a given WWID is guaranteed to yield
"non multipath" (only one visible path). Once multipathd gets to see this
path, the situation may already have changed, because additional paths may
have been detected in the meantime. Follow-up invocations of `multipath -u`
will also see several paths.

Red Hat already has a patch that generates a change event on all paths when
multipathd creates a map. When this event is processed, `multipath -u` will
see the existing map and (re-)classify the paths as multipath members.

The problematic case arises when the first uevent is processed by systemd, as
it will not have `SYSTEMD_READY=0` set. If some other service such as LVM grabs
the device at this point, subsequent attempts to create a multipath map will
fail. If it's DM, the `reassign_maps` option may come to rescue. But if someting
else (MD, mounted file system or swap, you name it) grabbed the device, that's
impossible. As we currently start multipathd pretty late in the boot cycle,
it's highly likely that this problem occurs if the device in question contains
meta data that is recognized by higher layers.

Here's an idea how to fix this: When a path is first encountered, and
`ignore_new_devs`=off+`ignore_wwids`=on, udev rules set a certain property
(e.g. `DM_MULTIPATH_DEVICE_PATH==2`), set `SYSTEMD_READY=0`, and use **systemd-run**
to create a timer that will fire a change event for the same path
at a certain point in time. For that we need a new config option.

multipathd treats this path as orphan, until additional paths show up,
in which case it will create a map as usual. Nothing special here.

When the timer fires, either the map will have been set up, or multipath will
see that it's being invoked for the second time, and proceed with SYSTEMD_READY=1.

# Recommendation

The command line options `multipath -i` and `multipathd -n` should be
deprecated and replaced by a config option shared between multipath and
multipathd. As the double negation ("unset ignore_wwids") is sort of
irritating, I propose something like `force_wwids`. This option, if set, would
imply `ignore_new_devs`=on and `ignore_wwids`=off; otherwise, the contrary.
The default value of `force_wwids` would be "off". In that case, multipath and
multipathd should apply exactly the same logic (a), b), d) above).

Finally, the idea outlined in the previous section, or maybe something better,
should be implemented. And, maybe, we can come up with a user-friendly scheme
to make sure that multipath configuration between initramfs and root FS is in
agreement.

-- 
Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel