On Mon, Jun 12, 2023 at 10:09:39AM +0200, Roger Pau Monné wrote: > On Fri, Jun 09, 2023 at 12:55:39PM -0400, Demi Marie Obenour wrote: > > On Fri, Jun 09, 2023 at 05:13:45PM +0200, Roger Pau Monné wrote: > > > On Thu, Jun 08, 2023 at 11:33:26AM -0400, Demi Marie Obenour wrote: > > > > On Thu, Jun 08, 2023 at 10:29:18AM +0200, Roger Pau Monné wrote: > > > > > On Wed, Jun 07, 2023 at 12:14:46PM -0400, Demi Marie Obenour wrote: > > > > > > On Wed, Jun 07, 2023 at 10:20:08AM +0200, Roger Pau Monné wrote: > > > > > > > Can you fetch a disk using a diskseq identifier? > > > > > > > > > > > > Not yet, although I have considered adding this ability. It would be > > > > > > one step towards a “diskseqfs” that userspace could use to open a device > > > > > > by diskseq. > > > > > > > > > > > > > Why I understand that this is an extra safety check in order to assert > > > > > > > blkback is opening the intended device, is this attempting to fix some > > > > > > > existing issue? > > > > > > > > > > > > Yes, it is. I have a block script (written in C) that validates the > > > > > > device it has opened before passing the information to blkback. It uses > > > > > > the diskseq to do this, but for that protection to be complete, blkback > > > > > > must also be aware of it. > > > > > > > > > > But if your block script opens the device, and keeps it open until > > > > > blkback has also taken a reference to it, there's no way such device > > > > > could be removed and recreated in the window you point out above, as > > > > > there's always a reference on it taken? > > > > > > > > This assumes that the block script is not killed in the meantime, > > > > which is not a safe assumption due to timeouts and the OOM killer. > > > > > > Doesn't seem very reliable to use with delete-on-close either then. > > > > That’s actually the purpose of delete-on-close! It ensures that if the > > block script gets killed, the device is automatically cleaned up. > > Block script attach getting killed shouldn't prevent the toolstack > from performing domain destruction, and thus removing the stale block > device. > > OTOH if your toolstack gets killed then there's not much that can be > done, and the system will need intervention in order to get back into > a sane state. > > Hitting OOM in your control domain however is unlikely to be handled > gracefully, even with delete-on-close. I agree, _but_ we should not make it any harder than necessary. > > > > > Then the block script will open the device by diskseq and pass the > > > > > major:minor numbers to blkback. > > > > > > > > Alternatively, the toolstack could write both the diskseq and > > > > major:minor numbers and be confident that it is referring to the > > > > correct device, no matter how long ago it got that information. > > > > This could be quite useful for e.g. one VM exporting a device to > > > > another VM by calling losetup(8) and expecting a human to make a > > > > decision based on various properties about the device. In this > > > > case there is no upper bound on the race window. > > > > > > Instead of playing with xenstore nodes, it might be better to simply > > > have blkback export on sysfs the diskseq of the opened device, and let > > > the block script check whether that's correct or not. That implies > > > less code in the kernel side, and doesn't pollute xenstore. > > > > This would require that blkback delay exposing the device to the > > frontend until the block script has checked that the diskseq is correct. > > This depends on your toolstack implementation. libxl won't start the > domain until block scripts have finished execution, and hence the > block script waiting for the sysfs node to appear and check it against > the expected value would be enough. True, but we cannot assume that everyone is using libxl. > > Much simpler for the block script to provide the diskseq in xenstore. > > If you want to avoid an extra xenstore node, I can make the diskseq part > > of the physical-device node. > > I'm thinking that we might want to introduce a "physical-device-uuid" > node and use that to provide the diskseq to the backened. Toolstacks > (or block scripts) would need to be sure the "physical-device-uuid" > node is populated before setting "physical-device", as writes to > that node would still trigger blkback watch. I think using two > distinct watches would just make the logic in blkback too > complicated. > > My preference would be for the kernel to have a function for opening a > device identified by a diskseq (as fetched from > "physical-device-uuid"), so that we don't have to open using > major:minor and then check the diskseq. In theory I agree, but in practice it would be a significantly more complex patch and given that it does not impact the uAPI I would prefer the less-invasive option. Is there anything more that needs to be done here, other than replacing the "diskseq" name? I prefer "physical-device-luid" because the ID is only valid in one particular VM. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
Attachment:
signature.asc
Description: PGP signature