Here's an attempt to write down the issue from ground up. Let me know if I've missed, or if you disagree with, anything in this document. **TL;DR:** Please scroll down to the "Recommendations" section. # The goal The goal is to make good decisions whether a given path is part of a multipath map, and make multipath setup "just work". This implies: * (Mandatory) multipathing must not harm system stability. - Entering emergency mode because a wrong multipath classification must be avoided. - Multipath activation shouldn't cause devices or filesystems to be undetected, even if they're not required for booting (unless these file systems are marked "nofail", emergency mode will be entered anyway). * (Important) Known devices that are reachable via multiple paths should be detected and set up correctly under multipath. It should be avoided that only a single path is used for such devices. * (Nice-to-have) Newly added devices should classified correctly. # Blacklisting The historical approach to the problem has been blacklisting. Users are supposed to set the list of paths to be multipathed using blacklist and blacklist exceptions. This works well if done properly. Unfortunately, getting the blacklist right is not so easy, in particular if it has to be done on many hosts, and thus I'll restrict the discussion from now on to a setup without explicit blacklisting by the user. Furthermore, I'll consider only setups using systemd. # Critical points in the code flow There are four places where paths are considered for multipathing: - `multipath -u` call in udev rules in initramfs, - multipathd in initramfs, - `multipath -u` call after switching root, - multipathd after switching root. # Avoiding errors It's simple: in order to avoid boot errors of the "mandatory" category, we must make sure that the results for all four points above are the same, for all paths to a given device. If the classifications differ, various kinds of problems may arise, from hard-to-even-notice to fatal. # Agreement between initramfs and booted system This is also quite simple: **Ensure identical configuration between root fs and initrd.** `multipath.conf`, `config_dir` contents, `wwids` file, udev rules, and command line parameters have to be equal between initramfs and root FS. Moreover, all relevant kernel modules need to be available and loaded early in initramfs (before uevent processing), to avoid errors caused by missing drivers. Also, multipathd service/socket must be enabled both in root and initrd. Unfortunately, that __puts the burden on the user__. He must recreate the initrd whenever any of the above changes. We have no means to enforce that. One might consider making the multipath configuration files read-only and creating a tool such as `visudo` that would recreate the initrd after every change, but that would be a future project and might not be appreciated by users. The above needs to be taken with a grain of salt, obviously only few config parameters and command line options have an effect on path classification: - blacklist and blacklist exceptions - `find_multipaths` - options related to WWID detection, `uid_attrs` etc. - `-i` option to multipath (`ignore_wwids`) - `-n` option to multipathd (`ignore_new_devs`) - `wwids` ## non-multipathed root An exception to the rule in the previous section is the use case where only data partitions (no disks required to boot the root FS) are multipathed. In this case it's sufficient to make sure that multipathing is off during initrd processing, and that, after switching root, the root device isn't falsely classied as multipath member. The latter can be achieved in various ways: - blacklisting - find_multipaths - not using "ignore_wwids" in udev rules If either of these is used, it actually doesn't matter whether multipath is kept out of the initrd or the "equal configuration" rule is followed. # Agreement between "multipath -u" and multipathd This is where it gets tricky, because configuration and timing matter. multipath and multipathd share most of the configuration, so unless the configuration is modified between the runs of the two executables, we can focus on just a few parameters. ## find_multipaths=off This case is quite simple: **"`ignore_wwids`" should be used if and only if "ignore_new_devs" is not** 1. `ignore_new_devs`=off and `ignore_wwids`=on: all paths will be treated as multipath devices by both multipathd and multipath -u. 2. `ignore_new devs`=on and `ignore_wwids`=off: both multipath and multipathd will only consider paths with WWIDs in the wwids file. Unfortunately, the current upstream default is `ignore_new_devs` off and `ignore_wwids` off, which is almost certain to lead to trouble. Option 1. is the current SUSE approach. ## find_multipaths=on The simple case, again, is 3. ignore_new devs=on and `ignore_wwids`=off: this behaves like 2. above. Users must explicitly add WWIDs in order to have them multipathed. If `ignore_new_devs`=off, multipathd will try to set up a map for a WWID if and only if - a) it sees more than one path to the WWID, or - b) the WWID is referenced in the wwids file. Setting up the map may fail if one or more paths have already been opened otherwise (by FS mounts, LVM, MD, whatever), which can happen if the path was classified as non-multipath before. If `ignore_wwids`=on, multipath -u will classify a path as multipath member if and only if - c) it sees more than one path to the WWID, or - d) there is already a multipath map referencing the path. "multipath -u" sees paths before multipathd during udev rule processing, so d) matters only in the root FS after a map may have been set up in initramfs already. Anyway, d) is an important difference to the behavior of multipathd, because multipathd (currently, as of 0.7.4) has no such logic. Vice versa, the logic of b) isn't followed by `multipath -u`. If we insist that multipath and multipathd come to the same conclusion about a given path at in a given situation, it follows that only 3. above is valid. This is what the past patches 64e27ec and ffbb886 enforce. It's obvious that `ignore_new_devs` and `ignore_wwids` should neither both be "on" nor "off". In both cases the applied logic would be just too different, agreement would be by coincidence only. ### ignore_new_devs=off+ignore_wwids=on Most of this can be fixed by adding case d) to the logic of multipathd, and b) to the logic of multipath. What remains is the question of paths being detected one at a time. If we fix b), we can focus on the case where the WWID is not in the wwids file. The first `multipath -u` invocation for a given WWID is guaranteed to yield "non multipath" (only one visible path). Once multipathd gets to see this path, the situation may already have changed, because additional paths may have been detected in the meantime. Follow-up invocations of `multipath -u` will also see several paths. Red Hat already has a patch that generates a change event on all paths when multipathd creates a map. When this event is processed, `multipath -u` will see the existing map and (re-)classify the paths as multipath members. The problematic case arises when the first uevent is processed by systemd, as it will not have `SYSTEMD_READY=0` set. If some other service such as LVM grabs the device at this point, subsequent attempts to create a multipath map will fail. If it's DM, the `reassign_maps` option may come to rescue. But if someting else (MD, mounted file system or swap, you name it) grabbed the device, that's impossible. As we currently start multipathd pretty late in the boot cycle, it's highly likely that this problem occurs if the device in question contains meta data that is recognized by higher layers. Here's an idea how to fix this: When a path is first encountered, and `ignore_new_devs`=off+`ignore_wwids`=on, udev rules set a certain property (e.g. `DM_MULTIPATH_DEVICE_PATH==2`), set `SYSTEMD_READY=0`, and use **systemd-run** to create a timer that will fire a change event for the same path at a certain point in time. For that we need a new config option. multipathd treats this path as orphan, until additional paths show up, in which case it will create a map as usual. Nothing special here. When the timer fires, either the map will have been set up, or multipath will see that it's being invoked for the second time, and proceed with SYSTEMD_READY=1. # Recommendation The command line options `multipath -i` and `multipathd -n` should be deprecated and replaced by a config option shared between multipath and multipathd. As the double negation ("unset ignore_wwids") is sort of irritating, I propose something like `force_wwids`. This option, if set, would imply `ignore_new_devs`=on and `ignore_wwids`=off; otherwise, the contrary. The default value of `force_wwids` would be "off". In that case, multipath and multipathd should apply exactly the same logic (a), b), d) above). Finally, the idea outlined in the previous section, or maybe something better, should be implemented. And, maybe, we can come up with a user-friendly scheme to make sure that multipath configuration between initramfs and root FS is in agreement. -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel