On 2/12/24 12:09, Martin Wilck wrote: > On Mon, 2024-02-12 at 10:51 +0100, Peter Rajnoha wrote: >> On 2/9/24 22:58, Martin Wilck wrote: >>> Hi Zdenek, Peter, David, Ben, >>> >>> I have been pondering the device-mapper udev rules a lot lately, >>> and I >>> believe I have found glitches in the logic of the device mapper >>> udev >>> flags that I'd like to bring to your attention. >>> >>> TL;DR: change I propose: >>> >>> * use DM_DISABLE_OTHER_RULES_FLAG consistently as a flag meaning >>> "upper layers should leave this device alone until told >>> otherwise", >>> saved between uevents in the udev db >>> * use DM_SUSPENDED consistently as a flag meaning "upper layers >>> should keep their hands off this device temporarily", not saved >>> in the udev db. >>> * don't use any other flags in upper layers, including 13-dm- >>> disk.rules. >>> >>> This implies: >>> >>> * stop setting DM_DISABLE_OTHER_RULES_FLAG from DM_SUSPENDED >>> in 10-dm.rules >>> * check DM_DISABLE_OTHER_RULES_FLAG instead of DM_NOSCAN in >>> 13-dm-disk.rules >>> * check DM_SUSPENDED in 69-dm-lvm.rules, 80-udisks2.rules >>> * stop using DM_NOSCAN in 11-dm-mpath.rules and 66-kpartx.rules >> >> Hi Martin! >> >> It is surely possible we could do some improvements and optimizations >> here - all the rules are a result of long evolution where we were >> fixing >> various issues on the way. So if we could shoot down some >> complexities, >> that would be great... >> >> On the other hand, we need to be really careful, because even a tiny >> change here can cause troubles if we omit some case. >> >> I'll surely go through your suggestions, thanks for diving deep into >> that! Just please give me some time so I'll remap all the paths >> through >> the rules in my head :) >> >> Note that we're working with 3 levels of logical rule separation >> here: >> >> 1) DM base (10-dm.rules, 13-dm-disk.rules, 95-dm-notify.rules) >> 2) DM subsystem (11-dm-lvm.rules, the mpath rules, ...) >> 3) all the "other" >> >> As for the very original intentions of the >> DM_UDEV_DISABLE_OTHER_RULES_FLAG - this one was supposed to be the >> *only >> one* to check for in "other" rules so they don't need to bother about >> checking other states/variables ("other" here means other than DM and >> any DM subsystem). One simple variable to check for others - an ideal >> - >> otherwise, we would be having a hard time to persuade others why they >> need to add complex checks in their rules/applications. > > Yes, I understand that, and it makes a lot of sense. > > The problem arises by saving and restoring to/from the udev db. We've > got different possible *reasons* (inputs) for a device becoming > unusable. The value we communicate to upper layers should arguably be a > "logical or" of all these. But then we can't use a single variable to > save and restore the state; we must determine the value of all "reason" > flags (either directly or by restoring from the db, as appropriate for > the flag in question) and "or"-them into a single "dontuse" flag. > > Example: Currently, we set DM_UDEV_DISABLE_OTHER_RULES_FLAG if > DM_SUSPENDED is set. If the next spurious uevent arrives, we see > DM_UDEV_DISABLE_OTHER_RULES_FLAG set, but we don't know if the origin > was DM_SUSPENDED or a genuine udevflag. In the former case, the device > would now be usable, while in the latter case it most probably > wouldn't. For now, I've posted a patch to fix this for the multipath > rules [1], but I'd like to see it fixed in the general DM rules. > Right, underneath within DM/DM-subystem, we should be able to keep and restore those reasons for why it has been flagged with DM_UDEV_DISABLE_OTHER_RULES_FLAG till now and when the state is good enough that we can drop it, we would do it transparently for higher (non-dm and non-dm-subsystem) layers. So if there's a case that is not currently handled by 10-dm.rules or 11-dm-<subsystem>.rules, we can fix that there. If it's a generic rule that applies to all DM, not just subystem, then yes, we can move that to 10-dm.rules (will have a look at your patch [1]). >> Yes, there's the unlucky exception in the 99-systemd.rules which >> needs >> to check the DM_SUSPENDED to import the "SYSTEMD_READY" from udev db, >> otherwise we had been running into an issue with stopping systemd >> device >> units prematurely (actually, this was an issue spotted much much >> later >> on after years of using the rules without this rule). >> And we haven't >> found a better way to check for this condition. This is mainly >> because >> systemd is also a "device" manager/tracker in some way through its >> "device" units, besides udev itself. I simply gave up there and >> admitted >> the check for DM_SUSPENDED case. > > Well, IIUC the main reason that systemd couldn't use > DM_UDEV_DISABLE_OTHER_RULES_FLAG was at least in part due to the fact > that the multipath rules used it in a special way that was inconsistent > with the rest of DM ;-) Aha! Well, honestly, I don't remember the exact details and context of that fix, but I know we haven't found a better way... > > I think there are 3 variants of "unusable": > > a) temporarily unusable (just for this event), ignore this uevent and > restore previous properties from db. > b) unusable, avoid IO, don't scan, don't activate (this is how we use > DM_UDEV_DISABLE_OTHER_RULES_FLAG). Upper layers will usually load saved > properties from udev db in this case, too. > c) like b), but also try to unmount / unconfigure if already used. This > is SYSTEMD_READY=0. I don't think DM has a flag with these semantics at > this time. I can imagine such a flag being set if a device was reloaded > with an incompatible table, but that's rather a corner case. > > It's an honorable goal to condense everything into a single variable > for consumer rules, but I think it doesn't work if we want the upper > layers to be able to distinguish these. We can merge a) and b) I think, > because their meaning for upper layers is practically the same, if we > get the saving and restoring right. > The c) case - well, it's questionable what should be done in that case, because that means we have literally cut off the underlying device while it was still in use. Any further IO from higher layers will return IO errors, will queue IOs or, in the worse case, issue IOs to a device that we don't want to anymore. Anyway, if I understand correctly, we simply need to signal higher layers either: A) device is unusable, forget it and clear all your current extra records you have about this device (including removing any custom symlinks for udev). That would also map to SYSTEMD_READY=0. B) device is unusable temporarily, restore any records you need, wait for the DM_UDEV_DISABLE_OTHER_RULES_FLAG=1 to drop to 0 (or being unset). What do you think about keeping a single DM_UDEV_DISABLE_OTHER_RULES_FLAG for this, just having a different value, say "2" to denote the B case? Otherwise, we need 2 distinct variables (which is harder for others to accept I bet). >> The DM_NOSCAN was actually just a helper and a more human readable >> name >> for "DM_SUSBYSTEM_UDEV_FLAG0" within LVM subsystem *only* at first. >> It is used during LV initialization - the wiping/zeroing of the LV >> before it is pronounced as usable - that's why, when it is set, we >> signal to "others" the DM_UDEV_DISABLE_OTHER_RULES_FLAG based on this >> flag. However, since we have the 13-dm-disk.rules which manages the >> blkid call for DM devices (and which is ours - owned by DM), we need >> to >> signal these rules to avoid calling blkid (as it could see >> uninitiliazed/stale data). In this case, we import any previous scan >> results from udev db. It's not like "completely ignore and skip rules >> for this event". > > I'm not sure if I understand what you mean with "completely ignore and > skip". Upper-level rules usually can't "completely ignore" an uevent if > they need to preserve any udev properties across the current event. If > the device is unusable in the weaker "don't try IO" sense, the upper > rules must import existing properties from the db, just like 13-dm- > disk.rules, lest the properties be forgotten by udev. IMO this weaker > semantics (corresponding to b) above) is what matters most for upper > level rules. I didn't consider that there might be extra records to keep around for anything else than blkid (13-dm-disk.rules) and DM/DM subsys (10-dm and 11-dm-subsystem.rules). But you're probably right here - differentiating the A) and B) from above could be beneficial for some layers above. If nothing else, then at least the systemd's SYSTEMD_READY case in 99-systemd.rules. If we could still keep the single DM_UDEV_DISABLE_OTHER_RULES_FLAG for others to check, I'd really like to have only that one, not adding more. > >> Other DM subsystems may get into exactly the same >> situation with initialization as LVM, so that's why it ended up as >> generic "DM_NOSCAN" so any 11-dm-<subsystem>.rules can use it to >> signal >> 13-dm-disk.rules to avoid the blkid scan this way (the "SCAN" here >> actually means "blkid scan" so the better name would be >> "DM_NO_BLKID_SCAN" probably). The "DM_NOSCAN" is not meant to be >> consumed by "others" at all. > > If blkid is likely to fail or retrieve stale data, other tools that > read content from the device are likely to fail, too, and that's why > they currently consume this variable. I agree that they shouldn't, but > that works only if DM_UDEV_DISABLE_OTHER_RULES_FLAG has "don't do IO" > semantics, as opposed to "completely ignore and skip". > >> The other way round, we also don't want (ideally) to check for >> DM_UDEV_DISABLE_OTHER_RULES flag in DM and DM subysystem rules. This >> flag is meant for "others" to check. So we have a clear separation of >> what is signalled withing "DM world" and "the other world". >> >> The DM_SUSPENDED flag - well, this was supposed to be just internal >> to >> DM and DM subystem rules. All the "others" have the >> DM_UDEV_DISABLE_OTHER_RULES_FLAG - they simply don't need to bother >> for >> what exact reason the device is not usable, whether it is a suspended >> case or whatever else... > > I understand. I agree that non-dm rules shouldn't use DM_SUSPENDED and > DM_NOSCAN. But this requires clean handling of save/restore, and that's > what we currently don't do right, IMO. > > Thanks, > Martin > -- Peter