On Thu, 16 Jul 2020 16:32:30 +0800 Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote: > > > > On 2020/7/14 上午7:29, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > - a src MDEV can migration to a target VF in SRIOV. > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to check > > > if one device is able to migrate to another device before triggering a real > > > live migration procedure. > > > we are not sure if this interface is of value or help to you. please don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __ userspace > > > /\ \ > > > / \write > > > / read \ > > > ________/__________ ___\|/_____________ > > > | migration_version | | migration_version |-->check migration > > > --------------------- --------------------- compatibility > > > device A device B > > > > > > > > > a device attribute named migration_version is defined under each device's > > > sysfs node. e.g. (/sys/bus/pci/devices/0000\:00\:02.0/$mdev_UUID/migration_version). > > > > > > Are you aware of the devlink based device management interface that is > > proposed upstream? I think it has many advantages over sysfs, do you > > consider to switch to that? Advantages, such as? > not familiar with the devlink. will do some research of it. > > > > > > > userspace tools read the migration_version as a string from the source device, > > > and write it to the migration_version sysfs attribute in the target device. > > > > > > The userspace should treat ANY of below conditions as two devices not compatible: > > > - any one of the two devices does not have a migration_version attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device vendor > > > driver and is completely opaque to the userspace. > > > > > > My understanding is that something opaque to userspace is not the philosophy > > but the VFIO live migration in itself is essentially a big opaque stream to userspace. > > > of Linux. Instead of having a generic API but opaque value, why not do in a > > vendor specific way like: > > > > 1) exposing the device capability in a vendor specific way via sysfs/devlink > > or other API > > 2) management read capability in both src and dst and determine whether we > > can do the migration > > > > This is the way we plan to do with vDPA. > > > yes, in another reply, Alex proposed to use an interface in json format. > I guess we can define something like > > { "self" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1", > "pv-mode" : "none", > } > ], > "compatible" : > [ > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_2", > "aggregator" : "1" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v1", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none", > }, > { "pciid" : "8086591d", > "driver" : "i915", > "gvt-version" : "v2", > "mdev_type" : "i915-GVTg_V5_4", > "aggregator" : "2" > "pv-mode" : "none, ppgtt, context", > } > ... > ] > } > > But as those fields are mostly vendor specific, the userspace can > only do simple string comparing, I guess the list would be very long as > it needs to enumerate all possible targets. This ignores so much of what I tried to achieve in my example :( > also, in some fileds like "gvt-version", is there a simple way to express > things like v2+? That's not a reasonable thing to express anyway, how can you be certain that v3 won't break compatibility with v2? Sean proposed a versioning scheme that accounts for this, using an x.y.z version expressing the major, minor, and bugfix versions, where there is no compatibility across major versions, minor versions have forward compatibility (ex. 1 -> 2 is ok, 2 -> 1 is not) and bugfix version number indicates some degree of internal improvement that is not visible to the user in terms of features or compatibility, but provides a basis for preferring equally compatible candidates. > If the userspace can read this interface both in src and target and > check whether both src and target are in corresponding compatible list, I > think it will work for us. > > But still, kernel should not rely on userspace's choice, the opaque > compatibility string is still required in kernel. No matter whether > it would be exposed to userspace as an compatibility checking interface, > vendor driver would keep this part of code and embed the string into the > migration stream. so exposing it as an interface to be used by libvirt to > do a safety check before a real live migration is only about enabling > the kernel part of check to happen ahead. As you indicate, the vendor driver is responsible for checking version information embedded within the migration stream. Therefore a migration should fail early if the devices are incompatible. Is it really libvirt's place to second guess what it has been directed to do? Why would we even proceed to design a user parse-able version interface if we still have a dependency on an opaque interface? Thanks, Alex