Hi Ganapatrao, On Tue, 23 Jan 2024 09:55:32 +0000, Ganapatrao Kulkarni <gankulkarni@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > Hi Marc, > > > +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) > > +{ > > + if (is_hyp_ctxt(vcpu)) { > > + vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu; > > + } else { > > + write_lock(&vcpu->kvm->mmu_lock); > > + vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu); > > + write_unlock(&vcpu->kvm->mmu_lock); > > + } > > Due to race, there is a non-existing L2's mmu table is getting loaded > for some of vCPU while booting L1(noticed with L1 boot using large > number of vCPUs). This is happening since at the early stage the > e2h(hyp-context) is not set and trap to eret of L1 boot-strap code > resulting in context switch as if it is returning to L2(guest enter) > and loading not initialized mmu table on those vCPUs resulting in > unrecoverable traps and aborts. I'm not sure I understand the problem you're describing here. What is the race exactly? Why isn't the shadow S2 good enough? Not having HCR_EL2.VM set doesn't mean we can use the same S2, as the TLBs are tagged by a different VMID, so staying on the canonical S2 seems wrong. My expectations are that the L1 ERET from EL2 to EL1 is trapped, and that we pick an empty S2 and start populating it. What fails in this process? > Adding code to check (below diff fixes the issue) stage 2 is enabled > and avoid the false ERET and continue with L1's mmu context. > > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c > index 340e2710cdda..1901dd19d770 100644 > --- a/arch/arm64/kvm/nested.c > +++ b/arch/arm64/kvm/nested.c > @@ -759,7 +759,12 @@ void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu) > > void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) > { > - if (is_hyp_ctxt(vcpu)) { > + bool nested_stage2_enabled = vcpu_read_sys_reg(vcpu, HCR_EL2) > & HCR_VM; > + > + /* Load L2 mmu only if nested_stage2_enabled, avoid mmu > + * load due to false ERET trap. > + */ > + if (is_hyp_ctxt(vcpu) || !nested_stage2_enabled) { > vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu; > } else { > write_lock(&vcpu->kvm->mmu_lock); As I said above, this doesn't look right. > Hoping we dont hit this when we move completely NV2 based > implementation and e2h is always set? No, the same constraints apply. I don't see why this would change. Thanks, M. -- Without deviation from the norm, progress is not possible.