>> > > This won't work with nested AVIC - we can't just inhibit a nested guest using its own AVIC, >> > > because migration happens. >> > >> > I mean because host decided to change its apic id, which it can in theory do any time, >> > even after the nested guest has started. Seriously, the only reason guest has to change apic id, >> > is to try to exploit some security hole. >> >> Hi >> >> Thanks for the information. >> >> IIUC, you mean KVM applies APICv inhibition only to L1 VM, leaving APICv >> enabled for L2 VM. Shouldn't KVM disable APICv for L2 VM in this case? >> It looks like a generic issue in dynamically toggling APICv scheme, >> e.g., qemu can set KVM_GUESTDBG_BLOCKIRQ after nested guest has started. >> > >That is the problem - you can't disable it for L2, unless you are willing to emulate it in software. >Or in other words, when nested guest uses a hardware feature, you can't at some point say to it: >sorry buddy - hardware feature disappeared. Agreed. I missed this. > >It is *currently* not a problem for APICv because it doesn't do IPI virtualization, >and even with these patches, it doesn't do this for nesting. >It does become when you allow nested guest to use this which I did in the nested AVIC code. > > >and writable apic ids do pose a large problem, since nested AVIC, will target L1's apic ids, >and when they can change under you without any notice, and even worse be duplicate, >it is just nightmare. OK. So the problem of disabling APICv is if we choose to disable APICv instead of making APIC ID read-only, although it can work perfectly for VMX IPIv, it effectively makes future cleanup to AVIC difficult/impossible because nested AVIC is practically to implement without assuming APIC IDs of L1 is immutable. Sean & Maxim How about go back to use a module parameter to opt in to read-only APIC ID. Although migration in some cases may fail but it shouldn't be a big issue as migration VMs from a KVM with nested=on to a KVM with nested=off may also fail.