On Wed, 20 Feb 2019 11:28:46 +0000 "Gonglei (Arei)" <arei.gonglei@xxxxxxxxxx> wrote: > > -----Original Message----- > > From: Dr. David Alan Gilbert [mailto:dgilbert@xxxxxxxxxx] > > Sent: Wednesday, February 20, 2019 7:02 PM > > To: Zhao Yan <yan.y.zhao@xxxxxxxxx> > > Cc: cjia@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; aik@xxxxxxxxx; > > Zhengxiao.zx@xxxxxxxxxxxxxxx; shuangtai.tst@xxxxxxxxxxxxxxx; > > qemu-devel@xxxxxxxxxx; kwankhede@xxxxxxxxxx; eauger@xxxxxxxxxx; > > yi.l.liu@xxxxxxxxx; eskultet@xxxxxxxxxx; ziye.yang@xxxxxxxxx; > > mlevitsk@xxxxxxxxxx; pasic@xxxxxxxxxxxxx; Gonglei (Arei) > > <arei.gonglei@xxxxxxxxxx>; felipe@xxxxxxxxxxx; Ken.Xue@xxxxxxx; > > kevin.tian@xxxxxxxxx; alex.williamson@xxxxxxxxxx; > > intel-gvt-dev@xxxxxxxxxxxxxxxxxxxxx; changpeng.liu@xxxxxxxxx; > > cohuck@xxxxxxxxxx; zhi.a.wang@xxxxxxxxx; jonathan.davies@xxxxxxxxxxx > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration > > > > * Zhao Yan (yan.y.zhao@xxxxxxxxx) wrote: > > > On Tue, Feb 19, 2019 at 11:32:13AM +0000, Dr. David Alan Gilbert wrote: > > > > * Yan Zhao (yan.y.zhao@xxxxxxxxx) wrote: > > > > > This patchset enables VFIO devices to have live migration capability. > > > > > Currently it does not support post-copy phase. > > > > > > > > > > It follows Alex's comments on last version of VFIO live migration patches, > > > > > including device states, VFIO device state region layout, dirty bitmap's > > > > > query. > > > > b) How do we detect if we're migrating from/to the wrong device or > > > > version of device? Or say to a device with older firmware or perhaps > > > > a device that has less device memory ? > > > Actually it's still an open for VFIO migration. Need to think about > > > whether it's better to check that in libvirt or qemu (like a device magic > > > along with verion ?). > > We must keep the hardware generation is the same with one POD of public cloud > providers. But we still think about the live migration between from the the lower > generation of hardware migrated to the higher generation. Agreed, lower->higher is the one direction that might make sense to support. But regardless of that, I think we need to make sure that incompatible devices/versions fail directly instead of failing in a subtle, hard to debug way. Might be useful to do some initial sanity checks in libvirt as well. How easy is it to obtain that information in a form that can be consumed by higher layers? Can we find out the device type at least? What about some kind of revision?