This patch adds a new KVM capability to address a crash we're currently having inside the nested guest kernel when running with GTSE disabled in the nested hypervisor. The summary is: We allow any guest a cmdline override of GTSE for migration purposes. The nested guest does not know it needs to use the option and tries to run 'tlbie' with LPCR_GTSE=0. The details are a bit more intricate: QEMU always sets GTSE=1 in OV5 even before calling KVM. At prom_init, guests use the OV5 value to set MMU_FTR_GTSE. This setting can be overridden by 'radix_hcall_invalidate=on' in the kernel cmdline. The option itself depends on the availability of FW_FEATURE_RPT_INVALIDATE, which is tied to QEMU's cap-rpt-invalidate capability. The MMU_FTR_GTSE flag leads guests to set PROC_TABLE_GTSE in their process tables and after H_REGISTER_PROC_TBL, both QEMU and KVM will set LPCR_GTSE=1 for that guest. Unless the guest uses the cmdline override, in which case: MMU_FTR_GTSE=0 -> PROC_TABLE_GTSE=0 -> LPCR_GTSE=0 We don't allow the nested hypervisor to set some LPCR bits for its nested guests, so if the nested HV has LPCR_GTSE=0, its nested guests will also have LPCR_GTSE=0. But since the only thing that can really flip GTSE is the cmdline override, if a nested guest runs without it, then the sequence goes: MMU_FTR_GTSE=1 -> PROC_TABLE_GTSE=1 -> LPCR_GTSE=0. With LPCR_GTSE=0 the HW will treat 'tlbie' as HV privileged. How the new capability helps: By having QEMU consult KVM on what the correct GTSE value is, we can have the nested hypervisor return the same value that it is currently using. QEMU will then put the correct value in the device-tree for the nested guest and MMU_FTR_GTSE will match LPCR_GTSE. Fixes: b87cc116c7e1 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability") Signed-off-by: Fabiano Rosas <farosas@xxxxxxxxxxxxx> --- This supersedes the previous RFC: "KVM: PPC: Book3s HV: Allow setting GTSE for the nested guest"*. Aneesh explained to me that we don't want to allow L1 and L2 GTSE values to differ. *- https://lore.kernel.org/r/20220304182657.2489303-1-farosas@xxxxxxxxxxxxx --- arch/powerpc/kvm/powerpc.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2ad0ccd202d5..dd08b3b729cd 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -677,6 +677,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_PPC_RPT_INVALIDATE: r = 1; break; + case KVM_CAP_PPC_GTSE: + r = mmu_has_feature(MMU_FTR_GTSE); + break; #endif default: r = 0; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 507ee1f2aa96..cc581e345d2a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1135,6 +1135,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_XSAVE2 208 #define KVM_CAP_SYS_ATTRIBUTES 209 #define KVM_CAP_PPC_AIL_MODE_3 210 +#define KVM_CAP_PPC_GTSE 211 #ifdef KVM_CAP_IRQ_ROUTING -- 2.34.1