On Tue, Jul 30, 2024 at 02:23:46AM +0900, Akihiko Odaki wrote: > On 2024/07/30 2:00, Peter Xu wrote: > > On Mon, Jul 29, 2024 at 04:58:03PM +0100, Daniel P. Berrangé wrote: > > > On Fri, Jul 26, 2024 at 04:47:40PM -0400, Peter Xu wrote: > > > > On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé wrote: > > > > > > > > > > In terms of launching QEMU I'd imagine: > > > > > > > > > > $QEMU -machine pc-q35-9.1 -platform linux-6.9 ...args... > > > > > > > > > > Any virtual machine HW features which are tied to host kernel features > > > > > would have their defaults set based on the requested -platform. The > > > > > -machine will be fully invariant wrt the host kernel. > > > > > > > > > > You would have -platform hlep to list available platforms, and > > > > > corresonding QMP "query-platforms" command to list what platforms > > > > > are supported on a given host OS. > > > > > > > > > > Downstream distros can provide their own platforms definitions > > > > > (eg "linux-rhel-9.5") if they have kernels whose feature set > > > > > diverges from upstream due to backports. > > > > > > > > > > Mgmt apps won't need to be taught about every single little QEMU > > > > > setting whose default is derived from the kernel. Individual > > > > > defaults are opaque and controlled by the requested platform. > > > > > > > > > > Live migration has clearly defined semantics, and mgmt app can > > > > > use query-platforms to validate two hosts are compatible. > > > > > > > > > > Omitting -platform should pick the very latest platform that is > > > > > cmpatible with the current host (not neccessarily the latest > > > > > platform built-in to QEMU). > > > > > > > > This seems to add one more layer to maintain, and so far I don't know > > > > whether it's a must. > > > > > > > > To put it simple, can we simply rely on qemu cmdline as "the guest ABI"? I > > > > thought it was mostly the case already, except some extremely rare > > > > outliers. > > > > > > > > When we have one host that boots up a VM using: > > > > > > > > $QEMU1 $cmdline > > > > > > > > Then another host boots up: > > > > > > > > $QEMU2 $cmdline -incoming XXX > > > > > > > > Then migration should succeed if $cmdline is exactly the same, and the VM > > > > can boot up all fine without errors on both sides. > > > > > > > > AFAICT this has nothing to do with what kernel is underneath, even not > > > > Linux? I think either QEMU1 / QEMU2 has the option to fail. But if it > > > > didn't, I thought the ABI should be guaranteed. > > > > > > We've got two mutually conflicting goals with the machine type > > > definitions. > > > > > > Primarily we use them to ensure stable ABI, but an important > > > secondary goal is to enable new tunables to have new defaults > > > set, without having to update every mgmt app. The latter > > > works very well when the defaults have no dependancy on the > > > platform kernel/OS, but breaks migration when they do have a > > > platform dependancy. > > > > > > > - Firstly, never quietly flipping any bit that affects the ABI... > > > > > > > > - Have a default value of off, then QEMU will always allow the VM to boot > > > > by default, while advanced users can opt-in on new features. We can't > > > > make this ON by default otherwise some VMs can already fail to boot, > > > > > > > > - If the host doesn't support the feature while the cmdline enabled it, > > > > it needs to fail QEMU boot rather than flipping, so that it says "hey, > > > > this host does not support running such VM specified, due to XXX > > > > feature missing". > > > > > > > > That's the only way an user could understand what happened, and IMHO that's > > > > a clean way that we stick with QEMU cmdline on defining the guest ABI, > > > > while in which the machine type is the fundation of such definition, as the > > > > machine type can decides many of the rest compat properties. And that's > > > > the whole point of the compat properties too (to make sure the guest ABI is > > > > stable). > > > > > > > > If kernel breaks it easily, all compat property things that we maintain can > > > > already stop making sense in general, because it didn't define the whole > > > > guest ABI.. > > > > > > > > So AFAIU that's really what we used for years, I hope I didn't overlook > > > > somehting. And maybe we don't yet need the "-platform" layer if we can > > > > keep up with this rule? > > > > > > We've failed at this for years wrt enabling use of new defaults that have > > > a platform depedancy, so historical practice isn't a good reference. > > > > > > There are 100's (possibly 1000's) of tunables set implicitly as part of > > > the machine type, and of those, libvirt likely only exposes a few 10's > > > of tunables. The vast majority are low level details that no mgmt app > > > wants to know about, they just want to accept QEMU's new defaults, > > > while preserving machine ABI. This is a good thing. No one wants the > > > burden of wiring up every single tunable into libvirt and mgmt apps. > > > > > > This is what the "-platform" concept would be intended to preserve. It > > > would allow a way to enable groups of settings that have a platform level > > > dependancy, without ever having to teach either libvirt or the mgmt apps > > > about the individual tunables. > > > > Do you think we can achieve similar goal by simply turning the feature to > > ON only after a few QEMU releases? I also mentioned that idea below. > > > > https://lore.kernel.org/r/ZqQNKZ9_OPhDq2AK@x1n > > > > So far it really sounds like the right thing to do to me to fix all similar > > issues, even without introducing anything new we need to maintain. > > > > To put that again, what we need to do is this: > > > > - To start: we should NEVER turn any guest ABI relevant bits > > automatically by QEMU, for sure.. > > > > - When introducing any new device feature that may both (1) affects guest > > ABI, and (2) depends on host kernel features, we set those default > > values to OFF always at start. So this already covers old machine > > types, no compat property needed so far. > > > > - We always fail hard on QEMU boot whenever we detected such property is > > not supported by the current host when with ON (and since it's OFF by > > default it must be that the user specified that ON). > > > > - When after a stablized period of time for that new feature to land most > > kernels (we may consider to look at how major Linux distros updates the > > kernel versions) when we're pretty sure the new feature should be > > available on most of the QEMU modern users, we add a patch to make the > > property default ON on the new machine type, add a compat property for > > old machines. > > > > The last bullet also means we'll start to fail new machine type from > > booting when running that very new QEMU on a very old kernel, but that's > > the trade-off, and when doing it right on "stablizing the feature in the > > kernel world", it should really be corner case. The user should simply > > invoke an old machine type on that old kernel, even if the qemu is new. > > docs/about/build-platforms.rst already defines supported platforms. One of > the supported platforms is Debian 11 (bullseye), and it carries Linux 5.10, > which was released December 2020. If we follow this platform support, a new > feature added to upstream Linux may take about 4 years before it gets > enabled by default on QEMU. > > As an upstream developer, I feel it is too long, but I'm sure there are > different opinions from different perspectives. Above rule won't stop the supported platforms to still run the QEMU binaries, am I right? Especially if it's a serious user the VMs should always be invoked with an old machine type, and that shouldn't be impacted, as the old machines should simply don't support such new kernel feature. The payoff here is only about when the user tries to start the VM using the default / latest machine type. Then with above rule it should fail clearly on what is required to turn OFF so as to boot that VM. Then the user has two options: turn that feature OFF manually, or switch to an old machine type. This is all still based on the fact that we do plan to keep that OFF for a while. So if we think "a few years" is too long, one option is we set it to ON after e.g. 1-2 years so it's in the middle ground where some such new users will fail booting the VM on old hosts, but it'll start to benefit whoever runs the same on a new host. So far I think it's not a major deal, especially considering that this so far looks like the easiest and (still looks to me..) workable solution to make migration always work, IMHO more important to serious VM users. I'm definitely open to other options or suggestions if there is. I just don't see anything yet that is easily applicable.. Thanks, -- Peter Xu