On Thu, 06 May 2021 12:43:26 +0100, Zenghui Yu <yuzenghui@xxxxxxxxxx> wrote: > > On 2021/5/6 14:33, Marc Zyngier wrote: > > On Wed, 05 May 2021 17:46:51 +0100, > > Marc Zyngier <maz@xxxxxxxxxx> wrote: > >> > >> Hi Zenghui, > >> > >> On Wed, 05 May 2021 15:23:02 +0100, > >> Zenghui Yu <yuzenghui@xxxxxxxxxx> wrote: > >>> > >>> Hi Marc, > >>> > >>> On 2020/11/3 0:40, Marc Zyngier wrote: > >>>> In an effort to remove the vcpu PC manipulations from EL1 on nVHE > >>>> systems, move kvm_skip_instr() to be HYP-specific. EL1's intent > >>>> to increment PC post emulation is now signalled via a flag in the > >>>> vcpu structure. > >>>> > >>>> Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > >>> > >>> [...] > >>> > >>>> @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) > >>>> __load_guest_stage2(vcpu->arch.hw_mmu); > >>>> __activate_traps(vcpu); > >>>> + __adjust_pc(vcpu); > >>> > >>> If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're > >>> handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka > >>> entry point address, normally provided by the primary VCPU) will be > >>> unexpectedly incremented here. That's pretty bad, I think. > >> > >> How can you online a CPU using PSCI if that CPU is currently spinning > >> on a WFI? Or is that we have transitioned via userspace to perform the > >> vcpu reset? I can imagine it happening in that case. > > I hadn't tried to reset VCPU from userspace. That would be a much easier > way to reproduce this problem. Then I don't understand how you end-up there. If the vcpu was in WFI, it wasn't off and PSCI_CPU_ON doesn't have any effect. > > Actually, this is far worse than it looks, and this only papers over > > one particular symptom. We need to resolve all pending PC updates > > *before* returning to userspace, or things like live migration can > > observe an inconsistent state. > > Ah yeah, agreed. > > Apart from the PC manipulation, I noticed that when handling the user > GET_VCPU_EVENTS request: > > | /* > | * We never return a pending ext_dabt here because we deliver it to > | * the virtual CPU directly when setting the event and it's no longer > | * 'pending' at this point. > | */ > > Which isn't true anymore now that we defer the exception injection right > before the VCPU entry. I believe the comment will be valid again once I fix the core issue, which is that we shouldn't return to userspace with pending PC adjustments. As long as KVM_GET_VCPU_EVENTS isn't issued on a running vcpu (which looks pointless to me), this should be just fine. Thanks, M. -- Without deviation from the norm, progress is not possible.