On Wed, Oct 3, 2018 at 5:40 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > The flexpriority module parameter instead is a kernel parameter because > it's really only useful for debugging problems that only happen on old > machines (and to a lesser extent, problems that only happen on pre-AVIC > AMD machines). Making it a per-VM capability would not be particularly > interesting, and it adds weight to the userspace ABI compatibility > (userspace ABI for example is the reason why we cannot drop userspace > irqchip support). But since these machines still exist and people are > using KVM on them, we have to keep the debugging aid around. It seems that the best way to mock up an older CPU that doesn't support FlexPriority is to: (a) intercept RDMSR of IA32_VMX_PROCBASED_CTLS2 and clear bit 0 of %rdx, and (b) intercept VMWRITE to the secondary processor-based VM-execution controls field and emulate VMfail when a 1 is written to bit 0. Pushing the code higher up in the stack complicates things quite a bit, and encourages bizarre behaviors like we have today, where, for instance, FlexPriority support is still enumerated in the IA32_VMX_PROCBASED_CTLS2 value synthesized for L1, and in fact, L1 can still enable FlexPriority for L2. That certainly isn't consistent with the behavior of old machines.