On Wed, 05 May 2021 17:46:51 +0100, Marc Zyngier <maz@xxxxxxxxxx> wrote: > > Hi Zenghui, > > On Wed, 05 May 2021 15:23:02 +0100, > Zenghui Yu <yuzenghui@xxxxxxxxxx> wrote: > > > > Hi Marc, > > > > On 2020/11/3 0:40, Marc Zyngier wrote: > > > In an effort to remove the vcpu PC manipulations from EL1 on nVHE > > > systems, move kvm_skip_instr() to be HYP-specific. EL1's intent > > > to increment PC post emulation is now signalled via a flag in the > > > vcpu structure. > > > > > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > > > > [...] > > > > > @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) > > > __load_guest_stage2(vcpu->arch.hw_mmu); > > > __activate_traps(vcpu); > > > + __adjust_pc(vcpu); > > > > If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're > > handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka > > entry point address, normally provided by the primary VCPU) will be > > unexpectedly incremented here. That's pretty bad, I think. > > How can you online a CPU using PSCI if that CPU is currently spinning > on a WFI? Or is that we have transitioned via userspace to perform the > vcpu reset? I can imagine it happening in that case. > > > This was noticed with a latest guest kernel, at least with commit > > dccc9da22ded ("arm64: Improve parking of stopped CPUs"), which put the > > stopped VCPUs in the WFx loop. The guest kernel shouted at me that > > > > "CPU: CPUs started in inconsistent modes" > > Ah, the perks of running guests with "quiet"... Well caught. > > > *after* rebooting. The problem is that the secondary entry point was > > corrupted by KVM as explained above. All of the secondary processors > > started from set_cpu_boot_mode_flag(), with w0=0. Oh well... > > > > I write the below diff and guess it will help. But I have to look at all > > other places where we adjust PC directly to make a right fix. Please let > > me know what do you think. > > > > > > Thanks, > > Zenghui > > > > ---->8---- > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > > index 956cdc240148..ed647eb387c3 100644 > > --- a/arch/arm64/kvm/reset.c > > +++ b/arch/arm64/kvm/reset.c > > @@ -265,7 +265,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) > > if (vcpu->arch.reset_state.be) > > kvm_vcpu_set_be(vcpu); > > > > + /* > > + * Don't bother with the KVM_ARM64_INCREMENT_PC flag while > > + * using this version of __adjust_pc(). > > + */ > > *vcpu_pc(vcpu) = target_pc; > > + vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC; Actually, this is far worse than it looks, and this only papers over one particular symptom. We need to resolve all pending PC updates *before* returning to userspace, or things like live migration can observe an inconsistent state. I'll try and cook something up. Thanks, M. -- Without deviation from the norm, progress is not possible.