On 11/24/2016 06:31 AM, Yuanhan Liu wrote: > On Tue, Nov 22, 2016 at 04:53:05PM +0200, Michael S. Tsirkin wrote: >>>> You keep assuming that you have the VM started first and >>>> figure out things afterwards, but this does not work. >>>> >>>> Think about a cluster of machines. You want to start a VM in >>>> a way that will ensure compatibility with all hosts >>>> in a cluster. >>> >>> I see. I was more considering about the case when the dst >>> host (including the qemu and dpdk combo) is given, and >>> then determine whether it will be a successfull migration >>> or not. >>> >>> And you are asking that we need to know which host could >>> be a good candidate before starting the migration. In such >>> case, we indeed need some inputs from both the qemu and >>> vhost-user backend. >>> >>> For DPDK, I think it could be simple, just as you said, it >>> could be either a tiny script, or even a macro defined in >>> the source code file (we extend it every time we add a >>> new feature) to let the libvirt to read it. Or something >>> else. >> >> There's the issue of APIs that tweak features as Maxime >> suggested. > > Yes, it's a good point. > >> Maybe the only thing to do is to deprecate it, > > Looks like so. > >> but I feel some way for application to pass info into >> guest might be benefitial. > > The two APIs are just for tweaking feature bits DPDK supports before > any device got connected. It's another way to disable some features > (the another obvious way is to through QEMU command lines). > > IMO, it's bit handy only in a case like: we have bunch of VMs. Instead > of disabling something though qemu one by one, we could disable it > once in DPDK. > > But I doubt the useful of it. It's only used in DPDK's vhost example > after all. Nor is it used in vhost pmd, neither is it used in OVS. rte_vhost_feature_disable() is currently used in OVS, lib/netdev-dpdk.c netdev_dpdk_vhost_class_init(void) { static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; /* This function can be called for different classes. The initialization * needs to be done only once */ if (ovsthread_once_start(&once)) { rte_vhost_driver_callback_register(&virtio_net_device_ops); rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4 | 1ULL << VIRTIO_NET_F_HOST_TSO6 | 1ULL << VIRTIO_NET_F_CSUM); > >>>> If you don't, guest visible interface will change >>>> and you won't be able to migrate. >>>> >>>> It does not make sense to discuss feature bits specifically >>>> since that is not the only part of interface. >>>> For example, max ring size supported might change. >>> >>> I don't quite understand why we have to consider the max ring >>> size here? Isn't it a virtio device attribute, that QEMU could >>> provide such compatibility information? >>> >>> I mean, DPDK is supposed to support vary vring size, it's QEMU >>> to give a specifc value. >> >> If backend supports s/g of any size up to 2^16, there's no issue. > > I don't know others, but I see no issues in DPDK. > >> ATM some backends might be assuming up to 1K s/g since >> QEMU never supported bigger ones. We might classify this >> as a bug, or not and add a feature flag. >> >> But it's just an example. There might be more values at issue >> in the future. > > Yeah, maybe. But we could analysis it one by one. > >>>> Let me describe how it works in qemu/libvirt. >>>> When you install a VM, you can specify compatibility >>>> level (aka "machine type"), and you can query the supported compatibility >>>> levels. Management uses that to find the supported compatibility >>>> and stores the compatibility in XML that is migrated with the VM. >>>> There's also a way to find the latest level which is the >>>> default unless overridden by user, again this level >>>> is recorded and then >>>> - management can make sure migration destination is compatible >>>> - management can avoid migration to hosts without that support >>> >>> Thanks for the info, it helps. >>> >>> ... >>>>>>>> As version here is an opaque string for libvirt and qemu, >>>>>>>> anything can be used - but I suggest either a list >>>>>>>> of values defining the interface, e.g. >>>>>>>> any_layout=on,max_ring=256 >>>>>>>> or a version including the name and vendor of the backend, >>>>>>>> e.g. "org.dpdk.v4.5.6". >>> >>> The version scheme may not be ideal here. Assume a QEMU is supposed >>> to work with a specific DPDK version, however, user may disable some >>> newer features through qemu command line, that it also could work with >>> an elder DPDK version. Using the version scheme will not allow us doing >>> such migration to an elder DPDK version. The MTU is a lively example >>> here? (when MTU feature is provided by QEMU but is actually disabled >>> by user, that it could also work with an elder DPDK without MTU support). >>> >>> --yliu >> >> OK, so does a list of values look better to you then? > > Yes, if there are no better way. > > And I think it may be better to not list all those features, literally. > But instead, using the number should be better, say, features=0xdeadbeef. > > Listing the feature names means we have to come to an agreement in all > components involved here (QEMU, libvirt, DPDK, VPP, and maybe more > backends), that we have to use the exact same feature names. Though it > may not be a big deal, it lacks some flexibility. > > A feature bits will not have this issue. > > --yliu > >> >> >>>>>>>> >>>>>>>> Note that typically the list of supported versions can only be >>>>>>>> extended, not shrunk. Also, if the host/guest interface >>>>>>>> does not change, don't change the current version as >>>>>>>> this just creates work for everyone. >>>>>>>> >>>>>>>> Thoughts? Would this work well for management? dpdk? vpp? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> -- >>>>>>>> MST -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list