Re: [PATCH v2 13/16] xen-blkback: Implement diskseq checks

Roger Pau Monné <roger.pau@xxxxxxxxxx> · Wed, 21 Jun 2023 12:07:05 +0200

On Tue, Jun 20, 2023 at 09:14:25PM -0400, Demi Marie Obenour wrote:
> On Mon, Jun 12, 2023 at 10:09:39AM +0200, Roger Pau Monné wrote:
> > On Fri, Jun 09, 2023 at 12:55:39PM -0400, Demi Marie Obenour wrote:
> > > On Fri, Jun 09, 2023 at 05:13:45PM +0200, Roger Pau Monné wrote:
> > > > On Thu, Jun 08, 2023 at 11:33:26AM -0400, Demi Marie Obenour wrote:
> > > > > On Thu, Jun 08, 2023 at 10:29:18AM +0200, Roger Pau Monné wrote:
> > > > > > On Wed, Jun 07, 2023 at 12:14:46PM -0400, Demi Marie Obenour wrote:
> > > > > > > On Wed, Jun 07, 2023 at 10:20:08AM +0200, Roger Pau Monné wrote:
> > > > > > Then the block script will open the device by diskseq and pass the
> > > > > > major:minor numbers to blkback.
> > > > > 
> > > > > Alternatively, the toolstack could write both the diskseq and
> > > > > major:minor numbers and be confident that it is referring to the
> > > > > correct device, no matter how long ago it got that information.
> > > > > This could be quite useful for e.g. one VM exporting a device to
> > > > > another VM by calling losetup(8) and expecting a human to make a
> > > > > decision based on various properties about the device.  In this
> > > > > case there is no upper bound on the race window.
> > > > 
> > > > Instead of playing with xenstore nodes, it might be better to simply
> > > > have blkback export on sysfs the diskseq of the opened device, and let
> > > > the block script check whether that's correct or not.  That implies
> > > > less code in the kernel side, and doesn't pollute xenstore.
> > > 
> > > This would require that blkback delay exposing the device to the
> > > frontend until the block script has checked that the diskseq is correct.
> > 
> > This depends on your toolstack implementation.  libxl won't start the
> > domain until block scripts have finished execution, and hence the
> > block script waiting for the sysfs node to appear and check it against
> > the expected value would be enough.
> 
> True, but we cannot assume that everyone is using libxl.

Right, for the udev case this won't be good, since the domain could be
already running, and hence could potentially attach to the backend
before the hotplug script realized the opened device is wrong.
Likewise for hot add disks.

> > > Much simpler for the block script to provide the diskseq in xenstore.
> > > If you want to avoid an extra xenstore node, I can make the diskseq part
> > > of the physical-device node.
> > 
> > I'm thinking that we might want to introduce a "physical-device-uuid"
> > node and use that to provide the diskseq to the backened.  Toolstacks
> > (or block scripts) would need to be sure the "physical-device-uuid"
> > node is populated before setting "physical-device", as writes to
> > that node would still trigger blkback watch.  I think using two
> > distinct watches would just make the logic in blkback too
> > complicated.
> > 
> > My preference would be for the kernel to have a function for opening a
> > device identified by a diskseq (as fetched from
> > "physical-device-uuid"), so that we don't have to open using
> > major:minor and then check the diskseq.
> 
> In theory I agree, but in practice it would be a significantly more
> complex patch and given that it does not impact the uAPI I would prefer
> the less-invasive option.

>From a blkback point of view I don't see that option as more invasive,
it's actually the other way around IMO.  On blkback we would use
blkdev_get_by_diskseq() (or equivalent) instead of
blkdev_get_by_dev(), so it would result in an overall simpler
change (because the check against diskseq won't be needed anymore).

> Is there anything more that needs to be done
> here, other than replacing the "diskseq" name?

I think we also spoke about using sscanf to parse the option.

The patch to Xen blkif.h needs to be accepted before the Linux side
can progress.

> I prefer
> "physical-device-luid" because the ID is only valid in one particular
> VM.

"physical-device-uid" then maybe?

Thanks, Roger.