Hi Ben, I agree with most of your analysis, I've added some replies below. But I'd like to discuss something else first. I'd like to *simplify the configuration*, and exclude configurations that make no sense. Before my commits 64e27e and ffbb88 last year, there were 3 settings related to path detection: find_multipaths, -i for multipath (ignore_wwids), and -n for multipathd (ignore_new_devs). This adds up to 8 combinations, which I denote "fin", "FiN", etc. in the following, using upper case for "on" and lower case for "off". The SUSE default setup is "fIn", and the Red Hat / Ubuntu one is "Fin". In initramfs, Red hat is effectively using "FiN" ("multipathd -n" isn't used, but strict blacklisting is used to the same effect). My patch es 64e27e and ffbb88 forced F=>N and F=>i, thus "FiN" became the only combination with find_multipaths, leaving 5 valid combinations. My recent RFC series allows only "xiN" and "xIn" combinations for consistency reasons. But I can see this doesn't fit the way Red Hat and others are setting up multipath, thus we need something different. I wonder if we can agree that the combinations "fIN", "FIN", and "fin" are useless. "IN" combinations are really dangerous and can lead to the fatal outcomes 4A.2, 4B.2, 4C.2 from your analysis; they shouldn't be allowed. "fin" is similar to "Fin" at first sight, but without the protection of "find_multipaths", it becomes much more likely that a device that multipath hasn't claimed is claimed by multipathd later, I think we should disallow it as well, although it's the current upstream default. Moreover, "fiN" and "FiN" are equivalent: if new devices are completely ignored, "find_multipaths yes" has no effect. If we agree on that, I'd like to propose a new configuration scheme. As in my RFC series, I'd like to replace the command line options with config file options (**). For backward compatibility reasons, I propose to use the "find_multipaths" option, but with 4 rather than 2 possible values: - find_multipaths "no": fIn, current SUSE default - find_multipaths "yes": Fin, current Red Hat / Ubuntu default - find_multipaths "strict": fiN/FiN, use only known WWIDs - find_multipaths "auto": FIn, try to be smart; this is what we've been discussing. Having limited the path detection options to reasonable combinations, we can add more logic to improve the "auto" case, one way or the other. [(**) "multipath -u -i" might still be allowed for purposes like you pointed out for anaconda, or interactive querying. It would override "ignore_wwids" for the "yes" and "strict" cases.] Now my reply to your mail. On Sat, 2018-01-20 at 21:21 -0600, Benjamin Marzinski wrote: > I apologize in advance for how long this is. It has to be, it's complex :-) Anyway I'll skip everything except 4C), because we agree on the rest anyway. > 4C: If in reality, the device should be multipathed but there is > something else that also wants to use the device, there are four > possible outcomes: > > 1. The device is not claimed by multipath, and is not > multipathed > 2. The device is claimed by multipath, but not multipathed > 3. The device is not claimed by multipath, but is multipathed > 4. The device is claimed by multipath and is multipathed > > Outcome 1 is suboptimal, since the device really should be > multipathed, > but the system will still be usable (albeit, with only a single path > to > the storage). However, this is fixable for future boots, by adding > the > wwid to the wwids file. A common case is that users install without multipath, and convert the system to using multipath later. That means dracut is run in a non- multipathed system, where the wwids file doesn't contain the entries for the root FS yet. That's a case which may lead to a fatal variant of 4C.3 later on. Along similar lines, it's essential for the Red Hat "multipath- hostonly" approach that indeed no service in the initrd grabs devices which might be multipathed later. If that happens, a fatal form of 4C.3 can occur. We see this often with BTRFS + subvolumes. But initrd issues are out of scope for the current discussion, I guess. > Outcome 2 is just as bad as Outcome 2 in class 4A. Of course, if the > device is supposed to be multipathed, and is claimed by multipath, it > is > very likely that multipathd will assemble on it, so this is an > extremely > rare case. Certainly. This is why "xIN" should be avoided (see above). > Outcome 3 is the cause of the never actually observed bug I explained > in > an earlier eamil. We did observe this, but the fatal cases where usually related to initrd/root FS configuration inconsistencies (see above). But then, SUSE is normally working with "fIn", where things are a little different. > [...] > > RedHat's current solution guarantees that you always get Outcome 1 > for > 4A devices, Outcome 3 for 4B devices, and either Outcome 1 or Outcome > 3 > for 4C devices (however in practice, 4C Outcome 3 has never been > reported). > > SUSE's "imply -n on find_multipaths" solution guarantees that you > always > get Outcome 1 for 4A devices, Outcome 1 for 4B devices, and Outcome 1 > for 4C devices. > > Hopefully we agree on the above analysis. If you think I'm wrong in > part > of it, please let me know, because this is what I'm reasoning from. > Now > on to your and my proposed solutions. All of this made sense to me. I made a similar write-up for myself. > Your proposed solution guarantees that you always get Outcome 1 for > 4A > devices. > > After that it gets a little trickier. Your solution involves a > timeout, > and that timeout can delay booting if there are 4A devices. Even if > we > do the equivalent of "multipath -n" in the initramfs, there are often > still filesystems that need to mount after we switch-root. Those will > get delayed, and the machine may not be usable until they are > mounted. I > really do feel that this will not be a rare case at all. You pointed > out > that this can be dealt with by decreasing the timeout, even all the > way > to 0. I think that since this timeout is protecting against a > problem > in the rare case, by making the common case slower, users will be > very > inclined to decrease it. Thus, it's worth looking at what happens in > the case where the timeout is long enough for multipathd to assemble > the device, and the case where it is not long enough. Yes, the problem is that for large multipath installations and/or SANs with slow device detection, the timeout has to be large to avoid "false negatives"; but a large timeout would delay booting in inacceptible ways for systems with single-path devices. My idea how to solve this is to make the timeout configurable through multipath.conf and hwtable, with extra logic to use a *very* small timeout (1s or no waiting at all) if a device is not listed in the in either hwtable or config file; thus the typical SAS or SATA devices of non-multipath OS installations wouldn't be waited for. That should address your main critique. > My solution idea is basically a mirror of yours. > > At a high level, your solution is: > When you see a "maybe" device, assume it's a "yes" and claim it so > that > nothing else can use the device. Then, set a timeout for multipathd > to > make use of the device. If that timeout passes, and multipathd hasn't > used the device, go back and unclaim the device so that it's in the > correct state. Then, if something else should use the device, it can. > > At a high level, my solution is: > When you see a "maybe" device, assume it's a "no" and don't claim it. > Also, disallow multipathd from using the device. Then, set a timeout > for > other things to make use of the device. When that timeout passes, > mutipathd is no longer disallowed from using that device, so that if > mutipathd should use the device, it can. If multipathd uses the > device, > go back and claim the device, so it's in the correct state. How would you disallow multipathd to use the device? By setting an udev property? And why would you do it? Don't you agree that, as soon as a second path is encountered, multipathd should be allowed to grab both? Maybe I misunderstood, and multipathd will only be forbidden to use the path as long as there's only one? But no, with "find_multipaths on", multipathd wouldn't grab a single path anyway... I'm a bit confused. Along similar lines as you argued about my approach, by delaying multipathd's actions, you'd increase the probability of the suboptimal outcome 1). And you're opening up the time window in which both multipathd and other layers can grab the device, which may be not so bad in practice as you say, but still bothers me for principal reasons. Finally, as you said yourself, multipathd is likely to "loose the race" anyway. With your patch you just make its chance even smaller. In a way, d7188fc "multipathd: start daemon after udev trigger" already implements your idea, because by the time multipathd starts, essential device detection will be finished (with the exception of extremely slow device detection where the udev queue runs empty). > The advantage of your method is that, as long as the timeout is long > enough, you always do the correct thing with multipath devices. The > disadvantage is that the timeout slows down the common case, to make > the > rare case correct. Would the idea with variable timeouts improve my approach in your eyes? > The advantage of my method is that it only slows down the rare case. > The > disadvantage is that it will not get the "Nice-to-have" outcome in > the > rare case. > > I'm working on coding up my solution, which includes a number of the > patches from your solution, but I'm leaving tomorrow for a week of > meetings and conferences, so it might be a little bit it coming. Looking forward to it. Btw, it just occured to me that your approach could be implemented in exactly the way as mine. Basically, all we need to change is what udev properties get set on the "maybe" uevents. Take my code, but don't set SYSTEMD_READY=0 and DM_MULTIPATH_DEVICE_PATH=1 in the "maybe" case... Should work, no? Cheers, Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel