On Tue, 2021-07-13 at 01:05 +0200, Matteo Croce wrote: > From: Matteo Croce <mcroce@xxxxxxxxxxxxx> > > Associating uevents with block devices in userspace is difficult and racy: > the uevent netlink socket is lossy, and on slow and overloaded systems has > a very high latency. Block devices do not have exclusive owners in > userspace, any process can set one up (e.g. loop devices). Moreover, device > names can be reused (e.g. loop0 can be reused again and again). A userspace > process setting up a block device and watching for its events cannot thus > reliably tell whether an event relates to the device it just set up or > another earlier instance with the same name. > > Being able to set a UUID on a loop device would solve the race conditions. > But it does not allow to derive orderings from uevents: if you see a uevent > with a UUID that does not match the device you are waiting for, you cannot > tell whether it's because the right uevent has not arrived yet, or it was > already sent and you missed it. So you cannot tell whether you should wait > for it or not. > > Being able to set devices up in a namespace would solve the race conditions > too, but it can work only if being namespaced is feasible in the first > place. Many userspace processes need to set devices up for the root > namespace, so this solution cannot always work. > > Changing the loop devices naming implementation to always use > monotonically increasing device numbers, instead of reusing the lowest > free number, would also solve the problem, but it would be very disruptive > to userspace and likely break many existing use cases. It would also be > quite awkward to use on long-running machines, as the loop device name > would quickly grow to many-digits length. > > Furthermore, this problem does not affect only loop devices - partition > probing is asynchronous and very slow on busy systems. It is very easy to > enter races when using LO_FLAGS_PARTSCAN and watching for the partitions to > show up, as it can take a long time for the uevents to be delivered after > setting them up. > > Associating a unique, monotonically increasing sequential number to the > lifetime of each block device, which can be retrieved with an ioctl > immediately upon setting it up, allows to solve the race conditions with > uevents, and also allows userspace processes to know whether they should > wait for the uevent they need or if it was dropped and thus they should > move on. > > This does not benefit only loop devices and block devices with multiple > partitions, but for example also removable media such as USB sticks or > cdroms/dvdroms/etc. > > The first patch is the core one, the 2..4 expose the information in > different ways, and the last one makes the loop device generate a media > changed event upon attach, detach or reconfigure, so the sequence number > is increased. > > If merged, this feature will immediately used by the userspace: > https://github.com/systemd/systemd/issues/17469#issuecomment-762919781 > > v4 -> v5: > - introduce a helper to raise media changed events > - use the new helper in loop instead of the full event code > - unexport inc_diskseq() which is only used by the block code now > - rebase on top of 5.14-rc1 > > v3 -> v4: > - rebased on top of 5.13 > - hook the seqnum increase into the media change event > - make the loop device raise media change events > - merge 1/6 and 5/6 > - move the uevent part of 1/6 into a separate one > - drop the now unneeded sysfs refactor > - change 'diskseq' to a global static variable > - add more comments > - refactor commit messages > > v2 -> v3: > - rebased on top of 5.13-rc7 > - resend because it appeared archived on patchwork > > v1 -> v2: > - increase seqnum on media change > - increase on loop detach > > Matteo Croce (6): > block: add disk sequence number > block: export the diskseq in uevents > block: add ioctl to read the disk sequence number > block: export diskseq in sysfs > block: add a helper to raise a media changed event > loop: raise media_change event > > Documentation/ABI/testing/sysfs-block | 12 ++++++ > block/disk-events.c | 62 +++++++++++++++++++++------ > block/genhd.c | 43 +++++++++++++++++++ > block/ioctl.c | 2 + > drivers/block/loop.c | 5 +++ > include/linux/genhd.h | 3 ++ > include/uapi/linux/fs.h | 1 + > 7 files changed, 114 insertions(+), 14 deletions(-) For the series: Tested-by: Luca Boccassi <bluca@xxxxxxxxxx> I have implemented the basic systemd support for this (ioctl + uevent, sysfs will be done later), and tested with this series on x86_64 and Debian 11 userspace, everything seems to work great. Thanks Matteo! Here's the implementation, in draft state until the kernel side is merged: https://github.com/systemd/systemd/pull/20257 -- Kind regards, Luca Boccassi
Attachment:
signature.asc
Description: This is a digitally signed message part