hi Alex and Dave, Thanks for your replies. Please see my comments inline. On Thu, Mar 28, 2019 at 06:10:20AM +0800, Alex Williamson wrote: > On Wed, 27 Mar 2019 20:18:54 +0000 > "Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> wrote: > > > * Zhao Yan (yan.y.zhao@xxxxxxxxx) wrote: > > > On Wed, Feb 20, 2019 at 07:42:42PM +0800, Cornelia Huck wrote: > > > > > > > > b) How do we detect if we're migrating from/to the wrong device or > > > > > > > > version of device? Or say to a device with older firmware or perhaps > > > > > > > > a device that has less device memory ? > > > > > > > Actually it's still an open for VFIO migration. Need to think about > > > > > > > whether it's better to check that in libvirt or qemu (like a device magic > > > > > > > along with verion ?). > > > > > > > > > > We must keep the hardware generation is the same with one POD of public cloud > > > > > providers. But we still think about the live migration between from the the lower > > > > > generation of hardware migrated to the higher generation. > > > > > > > > Agreed, lower->higher is the one direction that might make sense to > > > > support. > > > > > > > > But regardless of that, I think we need to make sure that incompatible > > > > devices/versions fail directly instead of failing in a subtle, hard to > > > > debug way. Might be useful to do some initial sanity checks in libvirt > > > > as well. > > > > > > > > How easy is it to obtain that information in a form that can be > > > > consumed by higher layers? Can we find out the device type at least? > > > > What about some kind of revision? > > > hi Alex and Cornelia > > > for device compatibility, do you think it's a good idea to use "version" > > > and "device version" fields? > > > > > > version field: identify live migration interface's version. it can have a > > > sort of backward compatibility, like target machine's version >= source > > > machine's version. something like that. > > Don't we essentially already have this via the device specific region? > The struct vfio_info_cap_header includes id and version fields, so we > can declare a migration id and increment the version for any > incompatible changes to the protocol. yes, good idea! so, what about declaring below new cap? #define VFIO_REGION_INFO_CAP_MIGRATION 4 struct vfio_region_info_cap_migration { struct vfio_info_cap_header header; __u32 device_version_len; __u8 device_version[]; }; > > > > > > device_version field consists two parts: > > > 1. vendor id : it takes 32 bits. e.g. 0x8086. > > Who allocates IDs? If we're going to use PCI vendor IDs, then I'd > suggest we use a bit to flag it as such so we can reserve that portion > of the 32bit address space. See for example: > > #define VFIO_REGION_TYPE_PCI_VENDOR_TYPE (1 << 31) > #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff) > > For vendor specific regions. Yes, use PCI vendor ID. you are right, we need to use highest bit (VFIO_REGION_TYPE_PCI_VENDOR_TYPE) to identify it's a PCI ID. Thanks for pointing it out. But, I have a question. what is VFIO_REGION_TYPE_PCI_VENDOR_MASK used for? why it's 0xffff? I searched QEMU and kernel code and did not find anywhere uses it. > > > 2. vendor proprietary string: it can be any string that a vendor driver > > > thinks can identify a source device. e.g. pciid + mdev type. > > > "vendor id" is to avoid overlap of "vendor proprietary string". > > > > > > > > > struct vfio_device_state_ctl { > > > __u32 version; /* ro */ > > > __u8 device_version[MAX_DEVICE_VERSION_LEN]; /* ro */ > > > struct { > > > __u32 action; /* GET_BUFFER, SET_BUFFER, IS_COMPATIBLE*/ > > > ... > > > }data; > > > ... > > > }; > > We have a buffer area where we can read and write data from the vendor > driver, why would we have this IS_COMPATIBLE protocol to write the > device version string but use a static fixed length version string in > the control header to read it? IOW, let's use GET_VERSION, > CHECK_VERSION actions that make use of the buffer area and allow vendor > specific version information length. you are right, such static fixed length version string is bad :) To get device version, do you think which approach below is better? 1. use GET_VERSION action, and read from region buffer 2. get it when querying cap VFIO_REGION_INFO_CAP_MIGRATION seems approach 1 is better, and cap VFIO_REGION_INFO_CAP_MIGRATION is only for checking migration interface's version? > > > > > > Then, an action IS_COMPATIBLE is added to check device compatibility. > > > > > > The flow to figure out whether a source device is migratable to target device > > > is like that: > > > 1. in source side's .save_setup, save source device's device_version string > > > 2. in target side's .load_state, load source device's device version string > > > and write it to data region, and call IS_COMPATIBLE action to ask vendor driver > > > to check whether the source device is compatible to it. > > > > > > The advantage of adding an IS_COMPATIBLE action is that, vendor driver can > > > maintain a compatibility table and decide whether source device is compatible > > > to target device according to its proprietary table. > > > In device_version string, vendor driver only has to describe the source > > > device as elaborately as possible and resorts to vendor driver in target side > > > to figure out whether they are compatible. > > I agree, it's too complicated and restrictive to try to create an > interface for the user to determine compatibility, let the driver > declare it compatible or not. :) > > It would also be good if the 'IS_COMPATIBLE' was somehow callable > > externally - so we could be able to answer a question like 'can we > > migrate this VM to this host' - from the management layer before it > > actually starts the migration. so qemu needs to expose two qmp/sysfs interfaces: GET_VERSION and CHECK_VERSION. GET_VERSION returns a vm's device's version string. CHECK_VERSION's input is device version string and return compatible/non-compatible. Do you think it's good? > I think we'd need to mirror this capability in sysfs to support that, > or create a qmp interface through QEMU that the device owner could make > the request on behalf of the management layer. Getting access to the > vfio device requires an iommu context that's already in use by the > device owner, we have no intention of supporting a model that allows > independent tasks access to a device. Thanks, > Alex > do you think two sysfs nodes under a device node is ok? e.g. /sys/devices/pci0000\:00/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/get_version /sys/devices/pci0000\:00/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/check_version Thanks Yan