On Tue, Oct 11, 2022 at 05:54:00PM +0100, Marc Zyngier wrote: > The kernel has an awfully complicated boot sequence in order to cope > with the various EL2 configurations, including those that "enhanced" > the architecture. We go from EL2 to EL1, then back to EL2, staying > at EL2 if VHE capable and otherwise go back to EL1. > > Here's a paracetamol tablet for you. Heh, still have a bit of a headache from this :) I'm having a hard time following where we skip the EL2 promotion based on __boot_cpu_mode. On the cpu_resume() path it looks like we take the return of init_kernel_el() and pass that along to finalise_el2(). As we are in EL1 at this point, it seems like we'd go init_kernel_el() -> init_el1(). What am I missing? -- Thanks, Oliver > The cpu_resume path follows the same logic, because coming up with > two versions of a square wheel is hard. > > However, things aren't this straightforward with pKVM, as the host > resume path is always proxied by the hypervisor, which means that > the kernel is always entered at EL1. Which contradicts what the > __boot_cpu_mode[] array contains (it obviously says EL2). > > This thus triggers a HVC call from EL1 to EL2 in a vain attempt > to upgrade from EL1 to EL2 VHE, which we are, funnily enough, > reluctant to grant to the host kernel. This is also completely > unexpected, and puzzles your average EL2 hacker. > > Address it by fixing up the boot mode at the point the host gets > deprivileged. is_hyp_mode_available() and co already have a static > branch to deal with this, making it pretty safe. > > Reported-by: Vincent Donnefort <vdonnefort@xxxxxxxxxx> > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > --- > arch/arm64/kvm/arm.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index b6c9bfa8492f..cf075c9b9ab1 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -2107,6 +2107,17 @@ static int pkvm_drop_host_privileges(void) > * once the host stage 2 is installed. > */ > static_branch_enable(&kvm_protected_mode_initialized); > + > + /* > + * Fixup the boot mode so that we don't take spurious round > + * trips via EL2 on cpu_resume. Flush to the PoC for a good > + * measure, so that it can be observed by a CPU coming out of > + * suspend with the MMU off. > + */ > + __boot_cpu_mode[0] = __boot_cpu_mode[1] = BOOT_CPU_MODE_EL1; > + dcache_clean_poc((unsigned long)__boot_cpu_mode, > + (unsigned long)(__boot_cpu_mode + 2)); > + > on_each_cpu(_kvm_host_prot_finalize, &ret, 1); > return ret; > } > -- > 2.34.1 >