On Thu, Jun 08, 2023 at 11:11:44AM +0200, Roger Pau Monné wrote: > On Wed, Jun 07, 2023 at 12:29:26PM -0400, Demi Marie Obenour wrote: > > On Wed, Jun 07, 2023 at 10:44:48AM +0200, Roger Pau Monné wrote: > > > On Tue, Jun 06, 2023 at 01:31:25PM -0400, Demi Marie Obenour wrote: > > > > On Tue, Jun 06, 2023 at 11:15:37AM +0200, Roger Pau Monné wrote: > > > > > On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote: > > > > > > Set "opened" to "0" before the hotplug script is called. Once the > > > > > > device node has been opened, set "opened" to "1". > > > > > > > > > > > > "opened" is used exclusively by userspace. It serves two purposes: > > > > > > > > > > > > 1. It tells userspace that the diskseq Xenstore entry is supported. > > > > > > > > > > > > 2. It tells userspace that it can wait for "opened" to be set to 1. > > > > > > Once "opened" is 1, blkback has a reference to the device, so > > > > > > userspace doesn't need to keep one. > > > > > > > > > > > > Together, these changes allow userspace to use block devices with > > > > > > delete-on-close behavior, such as loop devices with the autoclear flag > > > > > > set or device-mapper devices with the deferred-remove flag set. > > > > > > > > > > There was some work in the past to allow reloading blkback as a > > > > > module, it's clear that using delete-on-close won't work if attempting > > > > > to reload blkback. > > > > > > > > Should blkback stop itself from being unloaded if delete-on-close is in > > > > use? > > > > > > Hm, maybe. I guess that's the best we can do right now. > > > > I’ll implement this. > > Let's make this a separate patch. Good idea. > > > > > Isn't there some existing way to check whether a device is opened? > > > > > (stat syscall maybe?). > > > > > > > > Knowing that the device has been opened isn’t enough. The block script > > > > needs to be able to wait for blkback (and not something else) to open > > > > the device. Otherwise it will be confused if the device is opened by > > > > e.g. udev. > > > > > > Urg, no, the block script cannot wait indefinitely for blkback to open > > > the device, as it has an execution timeout. blkback is free to only > > > open the device upon guest frontend connection, and that (when using > > > libxl) requires the hotplug scripts execution to be finished so the > > > guest can be started. > > > > I’m a bit confused here. My understanding is that blkdev_get_by_dev() > > already opens the device, and that happens in the xenstore watch > > handler. I have tested this with delete-on-close device-mapper devices, > > and it does work. > > Right, but on a very contended system there's no guarantee of when > blkback will pick up the update to "physical-device" and open the > device, so far the block script only writes the physical-device node > and exits. With the proposed change the block script will also wait > for blkback to react to the physcal-device write, hence making VM > creation slower. Only block scripts that choose to wait for device open suffer this performance penalty. My current plan is to only do so for delete-on-close devices which are managed by the block script itself. Other devices will not suffer a performance hit. In the long term, I would like to solve this problem entirely by using an ioctl to configure blkback. The ioctl would take a file descriptor argument, avoiding the need for a round-trip through xenstore. This also solves a security annoyance with the current design, which is that the device is opened by a kernel thread and so the security context of whoever requested the device to be opened is lost. > > > > > I would like to avoid adding more xenstore blkback state if such > > > > > information can be fetched from other methods. > > > > > > > > I don’t think it can be, unless the information is passed via a > > > > completely different method. Maybe netlink(7) or ioctl(2)? Arguably > > > > this information should not be stored in Xenstore at all, as it exposes > > > > backend implementation details to the frontend. > > > > > > Could you maybe use sysfs for this information? > > > > Probably? This would involve adding a new file in sysfs. > > > > > We have all sorts of crap in xenstore, but it would be best if we can > > > see of placing stuff like this in another interface. > > > > Fair. > > Let's see if that's a suitable approach, and we can avoid having to > add an extra node to xenstore. I thought about this some more and realized that in Qubes OS, we might want to include the diskseq in the information dom0 gets about each exported block device. This would allow dom0 to write the xenstore node itself, but it would require some way for dom0 to be informed about blkback having this feature. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
Attachment:
signature.asc
Description: PGP signature