Re: [PATCH RFC] vfio: Documentation for the migration region

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 25, 2021 at 01:27:12PM +0100, Cornelia Huck wrote:
> On Wed, Nov 24 2021, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> 
> > On Wed, Nov 24, 2021 at 05:55:49PM +0100, Cornelia Huck wrote:
> 
> >> What I meant to say: If we give userspace the flexibility to operate
> >> this, we also must give different device types some flexibility. While
> >> subchannels will follow the general flow, they'll probably condense/omit
> >> some steps, as I/O is quite different to PCI there.
> >
> > I would say no - migration is general, no device type should get to
> > violate this spec.  Did you have something specific in mind? There is
> > very little PCI specific here already
> 
> I'm not really thinking about violating the spec, but more omitting
> things that do not really apply to the hardware. For example, it is
> really easy to shut up a subchannel, we don't really need to wait until
> nothing happens anymore, and it doesn't even have MMIO. 

I've never really looked closely at the s390 mdev drivers..

What does something like AP even do anyhow? The ioctl handler doesn't
do anything, there is no mmap hook, how does the VFIO userspace
interact with this thing?

> > In general, userspace can issue a VFIO_DEVICE_RESET ioctl and recover the
> > device back to device_state RUNNING. When a migration driver executes this
> > ioctl it should discard the data window and set migration_state to RUNNING as
> > part of resetting the device to a clean state. This must happen even if the
> > migration_state has errored. A freshly opened device FD should always be in
> > the RUNNING state.
> 
> Can the state immediately change from RUNNING to ERROR again?

Immediately? State change can only happen in response to the ioctl or
the reset.

""The migration_state cannot change asynchronously, upon writing the
migration_state the driver will either keep the current state and return
failure, return failure and go to ERROR, or succeed and go to the new state.""

> > However, a device may not compromise system integrity if it is subjected to a
> > MMIO. It can not trigger an error TLP, it can not trigger a Machine Check, and
> > it can not compromise device isolation.
> 
> "Machine Check" may be confusing to readers coming from s390; there, the
> device does not trigger the machine check, but the channel subsystem
> does, and we cannot prevent it. Maybe we can word it more as an example,
> so readers get an idea what the limits in this state are?

Lets say x86 machine check then which is a kernel-fatal event.

> Although I would like to see some more feedback from others, I think
> this is already a huge step in the right direction.

Thanks, I made all your other changes

Will send a v2 next week

Jason 



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux