On Tue, Jul 30, 2024 at 12:43 AM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > > On 2024/07/29 23:29, Peter Xu wrote: > > On Mon, Jul 29, 2024 at 01:45:12PM +0900, Akihiko Odaki wrote: > >> On 2024/07/29 12:50, Jason Wang wrote: > >>> On Sun, Jul 28, 2024 at 11:19 PM Akihiko Odaki <akihiko.odaki@xxxxxxxxxx> wrote: > >>>> > >>>> On 2024/07/27 5:47, Peter Xu wrote: > >>>>> On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé wrote: > >>>>>> On Fri, Jul 26, 2024 at 10:43:42AM -0400, Peter Xu wrote: > >>>>>>> On Fri, Jul 26, 2024 at 09:48:02AM +0100, Daniel P. Berrangé wrote: > >>>>>>>> On Fri, Jul 26, 2024 at 09:03:24AM +0200, Thomas Huth wrote: > >>>>>>>>> On 26/07/2024 08.08, Michael S. Tsirkin wrote: > >>>>>>>>>> On Thu, Jul 25, 2024 at 06:18:20PM -0400, Peter Xu wrote: > >>>>>>>>>>> On Tue, Aug 01, 2023 at 01:31:48AM +0300, Yuri Benditovich wrote: > >>>>>>>>>>>> USO features of virtio-net device depend on kernel ability > >>>>>>>>>>>> to support them, for backward compatibility by default the > >>>>>>>>>>>> features are disabled on 8.0 and earlier. > >>>>>>>>>>>> > >>>>>>>>>>>> Signed-off-by: Yuri Benditovich <yuri.benditovich@xxxxxxxxxx> > >>>>>>>>>>>> Signed-off-by: Andrew Melnychecnko <andrew@xxxxxxxxxx> > >>>>>>>>>>> > >>>>>>>>>>> Looks like this patch broke migration when the VM starts on a host that has > >>>>>>>>>>> USO supported, to another host that doesn't.. > >>>>>>>>>> > >>>>>>>>>> This was always the case with all offloads. The answer at the moment is, > >>>>>>>>>> don't do this. > >>>>>>>>> > >>>>>>>>> May I ask for my understanding: > >>>>>>>>> "don't do this" = don't automatically enable/disable virtio features in QEMU > >>>>>>>>> depending on host kernel features, or "don't do this" = don't try to migrate > >>>>>>>>> between machines that have different host kernel features? > >>>>>>>>> > >>>>>>>>>> Long term, we need to start exposing management APIs > >>>>>>>>>> to discover this, and management has to disable unsupported features. > >>>>>>>>> > >>>>>>>>> Ack, this likely needs some treatments from the libvirt side, too. > >>>>>>>> > >>>>>>>> When QEMU automatically toggles machine type featuers based on host > >>>>>>>> kernel, relying on libvirt to then disable them again is impractical, > >>>>>>>> as we cannot assume that the libvirt people are using knows about > >>>>>>>> newly introduced features. Even if libvirt is updated to know about > >>>>>>>> it, people can easily be using a previous libvirt release. > >>>>>>>> > >>>>>>>> QEMU itself needs to make the machine types do that they are there > >>>>>>>> todo, which is to define a stable machine ABI. > >>>>>>>> > >>>>>>>> What QEMU is missing here is a "platform ABI" concept, to encode > >>>>>>>> sets of features which are tied to specific platform generations. > >>>>>>>> As long as we don't have that we'll keep having these broken > >>>>>>>> migration problems from machine types dynamically changing instead > >>>>>>>> of providing a stable guest ABI. > >>>>>>> > >>>>>>> Any more elaboration on this idea? Would it be easily feasible in > >>>>>>> implementation? > >>>>>> > >>>>>> In terms of launching QEMU I'd imagine: > >>>>>> > >>>>>> $QEMU -machine pc-q35-9.1 -platform linux-6.9 ...args... > >>>>>> > >>>>>> Any virtual machine HW features which are tied to host kernel features > >>>>>> would have their defaults set based on the requested -platform. The > >>>>>> -machine will be fully invariant wrt the host kernel. > >>>>>> > >>>>>> You would have -platform hlep to list available platforms, and > >>>>>> corresonding QMP "query-platforms" command to list what platforms > >>>>>> are supported on a given host OS. > >>>>>> > >>>>>> Downstream distros can provide their own platforms definitions > >>>>>> (eg "linux-rhel-9.5") if they have kernels whose feature set > >>>>>> diverges from upstream due to backports. > >>>>>> > >>>>>> Mgmt apps won't need to be taught about every single little QEMU > >>>>>> setting whose default is derived from the kernel. Individual > >>>>>> defaults are opaque and controlled by the requested platform. > >>>>>> > >>>>>> Live migration has clearly defined semantics, and mgmt app can > >>>>>> use query-platforms to validate two hosts are compatible. > >>>>>> > >>>>>> Omitting -platform should pick the very latest platform that is > >>>>>> cmpatible with the current host (not neccessarily the latest > >>>>>> platform built-in to QEMU). > >>>>> > >>>>> This seems to add one more layer to maintain, and so far I don't know > >>>>> whether it's a must. > >>>>> > >>>>> To put it simple, can we simply rely on qemu cmdline as "the guest ABI"? I > >>>>> thought it was mostly the case already, except some extremely rare > >>>>> outliers. > >>>>> > >>>>> When we have one host that boots up a VM using: > >>>>> > >>>>> $QEMU1 $cmdline > >>>>> > >>>>> Then another host boots up: > >>>>> > >>>>> $QEMU2 $cmdline -incoming XXX > >>>>> > >>>>> Then migration should succeed if $cmdline is exactly the same, and the VM > >>>>> can boot up all fine without errors on both sides. > >>>>> > >>>>> AFAICT this has nothing to do with what kernel is underneath, even not > >>>>> Linux? I think either QEMU1 / QEMU2 has the option to fail. But if it > >>>>> didn't, I thought the ABI should be guaranteed. > >>>>> > >>>>> That's why I think this is a migration violation, as 99.99% of other device > >>>>> properties should be following this rule. The issue here is, we have the > >>>>> same virtio-net-pci cmdline on both sides in this case, but the ABI got > >>>>> break. > >>>>> > >>>>> That's also why I was suggesting if the property contributes to the guest > >>>>> ABI, then AFAIU QEMU needs to: > >>>>> > >>>>> - Firstly, never quietly flipping any bit that affects the ABI... > >>>>> > >>>>> - Have a default value of off, then QEMU will always allow the VM to boot > >>>>> by default, while advanced users can opt-in on new features. We can't > >>>>> make this ON by default otherwise some VMs can already fail to boot, > >>>> > >>>> It may not be necessary the case that old features are supported by > >>>> every systems. In an extreme case, a user may migrate a VM from Linux to > >>>> Windows, which probably doesn't support any offloading at all. A more > >>>> convincing scenario is RSS offloading with eBPF; using eBPF requires a > >>>> privilege so we cannot assume it is always available even on the latest > >>>> version of Linux. > >>> > >>> I don't get why eBPF matters here. It is something that is not noticed > >>> by the guest and we have a fallback anyhow. > > It is noticeable for the guest, and the fallback is not effective with > vhost. It's a bug then. Qemu can fallback to tuntap if it sees issues in vhost. Thanks