On Fri, Jul 26, 2024 at 04:47:40PM -0400, Peter Xu wrote:
On Fri, Jul 26, 2024 at 04:17:12PM +0100, Daniel P. Berrangé wrote:
In terms of launching QEMU I'd imagine:
$QEMU -machine pc-q35-9.1 -platform linux-6.9 ...args...
Any virtual machine HW features which are tied to host kernel features
would have their defaults set based on the requested -platform. The
-machine will be fully invariant wrt the host kernel.
You would have -platform hlep to list available platforms, and
corresonding QMP "query-platforms" command to list what platforms
are supported on a given host OS.
Downstream distros can provide their own platforms definitions
(eg "linux-rhel-9.5") if they have kernels whose feature set
diverges from upstream due to backports.
Mgmt apps won't need to be taught about every single little QEMU
setting whose default is derived from the kernel. Individual
defaults are opaque and controlled by the requested platform.
Live migration has clearly defined semantics, and mgmt app can
use query-platforms to validate two hosts are compatible.
Omitting -platform should pick the very latest platform that is
cmpatible with the current host (not neccessarily the latest
platform built-in to QEMU).
This seems to add one more layer to maintain, and so far I don't know
whether it's a must.
To put it simple, can we simply rely on qemu cmdline as "the guest ABI"? I
thought it was mostly the case already, except some extremely rare
outliers.
When we have one host that boots up a VM using:
$QEMU1 $cmdline
Then another host boots up:
$QEMU2 $cmdline -incoming XXX
Then migration should succeed if $cmdline is exactly the same, and the VM
can boot up all fine without errors on both sides.
AFAICT this has nothing to do with what kernel is underneath, even not
Linux? I think either QEMU1 / QEMU2 has the option to fail. But if it
didn't, I thought the ABI should be guaranteed.
We've got two mutually conflicting goals with the machine type
definitions.
Primarily we use them to ensure stable ABI, but an important
secondary goal is to enable new tunables to have new defaults
set, without having to update every mgmt app. The latter
works very well when the defaults have no dependancy on the
platform kernel/OS, but breaks migration when they do have a
platform dependancy.
- Firstly, never quietly flipping any bit that affects the ABI...
- Have a default value of off, then QEMU will always allow the VM to boot
by default, while advanced users can opt-in on new features. We can't
make this ON by default otherwise some VMs can already fail to boot,
- If the host doesn't support the feature while the cmdline enabled it,
it needs to fail QEMU boot rather than flipping, so that it says "hey,
this host does not support running such VM specified, due to XXX
feature missing".
That's the only way an user could understand what happened, and IMHO that's
a clean way that we stick with QEMU cmdline on defining the guest ABI,
while in which the machine type is the fundation of such definition, as the
machine type can decides many of the rest compat properties. And that's
the whole point of the compat properties too (to make sure the guest ABI is
stable).
If kernel breaks it easily, all compat property things that we maintain can
already stop making sense in general, because it didn't define the whole
guest ABI..
So AFAIU that's really what we used for years, I hope I didn't overlook
somehting. And maybe we don't yet need the "-platform" layer if we can
keep up with this rule?
We've failed at this for years wrt enabling use of new defaults that have
a platform depedancy, so historical practice isn't a good reference.
There are 100's (possibly 1000's) of tunables set implicitly as part of
the machine type, and of those, libvirt likely only exposes a few 10's
of tunables. The vast majority are low level details that no mgmt app
wants to know about, they just want to accept QEMU's new defaults,
while preserving machine ABI. This is a good thing. No one wants the
burden of wiring up every single tunable into libvirt and mgmt apps.
This is what the "-platform" concept would be intended to preserve. It
would allow a way to enable groups of settings that have a platform level
dependancy, without ever having to teach either libvirt or the mgmt apps
about the individual tunables.