On 12/15/21 11:09, Thomas Gleixner wrote:
Lets assume the restore order is XSTATE, XCR0, XFD:
XSTATE has everything in init state, which means the default
buffer is good enough
XCR0 has everything enabled including AMX, so the buffer is
expanded
XFD has AMX disable set, which means the buffer expansion was
pointless
If we go there, then we can just use a full expanded buffer for KVM
unconditionally and be done with it. That spares a lot of code.
If we decide to use a full expanded buffer as soon as KVM_SET_CPUID2 is
done, that would work for me. Basically KVM_SET_CPUID2 would:
- check bits from CPUID[0xD] against the prctl requested with GUEST_PERM
- return with -ENXIO or whatever if any dynamic bits were not requested
- otherwise call fpstate_realloc if there are any dynamic bits requested
Considering that in practice all Linux guests with AMX would have XFD
passthrough (because if there's no prctl, Linux keeps AMX disabled in
XFD), this removes the need to do all the #NM handling too. Just make
XFD passthrough if it can ever be set to a nonzero value. This costs an
RDMSR per vmexit even if neither the host nor the guest ever use AMX.
That said, if we don't want to use a full expanded buffer, I don't
expect any issue with requiring XFD first then XCR0 then XSAVE. As Juan
said, QEMU first gets everything from the migration stream and then
restores it. So yes, the QEMU code is complicated and messy but we can
change the order without breaking migration from old to new QEMU. QEMU
also forbids migration if there's any CPUID feature that it does not
understand, so the old versions that don't understand QEMU won't migrate
AMX (with no possibility to override).
Paolo