Re: [PATCH 0/5] QEMU VFIO live migration

"Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> · Fri, 8 Mar 2019 16:21:46 +0000

* Alex Williamson (alex.williamson@xxxxxxxxxx) wrote:
> On Thu, 7 Mar 2019 23:20:36 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> 
> > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> > > Sent: Friday, March 8, 2019 1:44 AM  
> > > > >  
> > > > > >         This kind of data needs to be saved / loaded in pre-copy and
> > > > > >         stop-and-copy phase.
> > > > > >         The data of device memory is held in device memory region.
> > > > > >         Size of devie memory is usually larger than that of device
> > > > > >         memory region. qemu needs to save/load it in chunks of size of
> > > > > >         device memory region.
> > > > > >         Not all device has device memory. Like IGD only uses system  
> > > memory.  
> > > > >
> > > > > It seems a little gratuitous to me that this is a separate region or
> > > > > that this data is handled separately.  All of this data is opaque to
> > > > > QEMU, so why do we need to separate it?  
> > > > hi Alex,
> > > > as the device state interfaces are provided by kernel, it is expected to
> > > > meet as general needs as possible. So, do you think there are such use
> > > > cases from user space that user space knows well of the device, and
> > > > it wants kernel to return desired data back to it.
> > > > E.g. It just wants to get whole device config data including all mmios,
> > > > page tables, pci config data...
> > > > or, It just wants to get current device memory snapshot, not including any
> > > > dirty data.
> > > > Or, It just needs the dirty pages in device memory or system memory.
> > > > With all this accurate query, quite a lot of useful features can be
> > > > developped in user space.
> > > >
> > > > If all of this data is opaque to user app, seems the only use case is
> > > > for live migration.  
> > > 
> > > I can certainly appreciate a more versatile interface, but I think
> > > we're also trying to create the most simple interface we can, with the
> > > primary target being live migration.  As soon as we start defining this
> > > type of device memory and that type of device memory, we're going to
> > > have another device come along that needs yet another because they have
> > > a slightly different requirement.  Even without that, we're going to
> > > have vendor drivers implement it differently, so what works for one
> > > device for a more targeted approach may not work for all devices.  Can
> > > you enumerate some specific examples of the use cases you imagine your
> > > design to enable?
> > >   
> > 
> > Do we want to consider an use case where user space would like to
> > selectively introspect a portion of the device state (including implicit 
> > state which are not available through PCI regions), and may ask for
> > capability of direct mapping of selected portion for scanning (e.g.
> > device memory) instead of always turning on dirty logging on all
> > device state?
> 
> I don't see that a migration interface necessarily lends itself to this
> use case.  A migration data stream has no requirement to be user
> consumable as anything other than opaque data, there's also no
> requirement that it expose state in a form that directly represents the
> internal state of the device.  In fact I'm not sure we want to encourage
> introspection via this data stream.  If a user knows how to interpret
> the data, what prevents them from modifying the data in-flight?  I've
> raised the question previously regarding how the vendor driver can
> validate the integrity of the migration data stream.  Using the
> migration interface to introspect the device certainly suggests an
> interface ripe for exploiting any potential weakness in the vendor
> driver reassembling that migration stream.  If the user has an mmap to
> the actual live working state of the vendor driver, protection in the
> hardware seems like the only way you could protect against a malicious
> user.  Please be defensive in what is directly exposed to the user and
> what safeguards are in place within the vendor driver for validating
> incoming data.  Thanks,

Hmm; that sounds like a security-by-obscurity answer!

The scripts/analyze-migration.py scripts will actually dump the
migration stream data in an almost readable format.
So if you properly define the VMState definitions it should be almost
readable; it's occasionally been useful.

I agree that you should be very very careful to validate the incoming
migration stream against:
  a) Corruption
  b) Wrong driver versions
  c) Malicious intent
    c.1) Especially by the guest
    c.2) Or by someone trying to feed you a duff stream
  d) Someone trying load the VFIO stream into completely the wrong
device.

Whether the migration interface is the right thing to use for that
inspection hmm; well it might be - if you're trying to debug
your device and need a dump of it's state, then why not?
(I guess you end up with something not dissimilar to what things
like intek_reg_snapshot in intel-gpu-tools does).

Dave

> Alex
--
Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK