Hi Alex, On 12/05/2020 16:47, Alexandru Elisei wrote: > On 5/12/20 12:17 PM, James Morse wrote: >> On 11/05/2020 17:38, Alexandru Elisei wrote: >>> On 4/22/20 1:00 PM, Marc Zyngier wrote: >>>> From: Christoffer Dall <christoffer.dall@xxxxxxx> >>>> >>>> As we are about to reuse our stage 2 page table manipulation code for >>>> shadow stage 2 page tables in the context of nested virtualization, we >>>> are going to manage multiple stage 2 page tables for a single VM. >>>> >>>> This requires some pretty invasive changes to our data structures, >>>> which moves the vmid and pgd pointers into a separate structure and >>>> change pretty much all of our mmu code to operate on this structure >>>> instead. >>>> >>>> The new structure is called struct kvm_s2_mmu. >>>> >>>> There is no intended functional change by this patch alone. >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h >>>> index 7dd8fefa6aecd..664a5d92ae9b8 100644 >>>> --- a/arch/arm64/include/asm/kvm_host.h >>>> +++ b/arch/arm64/include/asm/kvm_host.h >>>> @@ -63,19 +63,32 @@ struct kvm_vmid { >>>> u32 vmid; >>>> }; >>>> >>>> -struct kvm_arch { >>>> +struct kvm_s2_mmu { >>>> struct kvm_vmid vmid; >>>> >>>> - /* stage2 entry level table */ >>>> - pgd_t *pgd; >>>> - phys_addr_t pgd_phys; >>>> - >>>> - /* VTCR_EL2 value for this VM */ >>>> - u64 vtcr; >>>> + /* >>>> + * stage2 entry level table >>>> + * >>>> + * Two kvm_s2_mmu structures in the same VM can point to the same pgd >>>> + * here. This happens when running a non-VHE guest hypervisor which >>>> + * uses the canonical stage 2 page table for both vEL2 and for vEL1/0 >>>> + * with vHCR_EL2.VM == 0. >>> It makes more sense to me to say that a non-VHE guest hypervisor will use the >>> canonical stage *1* page table when running at EL2 >> Can KVM say anything about stage1? Its totally under the the guests control even at vEL2... > It is. My interpretation of the comment was that if the guest doesn't have virtual > stage 2 enabled (we're not running a guest of the L1 hypervisor), then the L0 host > can use the same L0 stage 2 tables because we're running the same guest (the L1 > VM), regardless of the actual exception level for the guest. I think you're right, but I can't see where stage 1 comes in to it! > If I remember > correctly, KVM assigns different vmids for guests running at vEL1/0 and vEL2 with > vHCR_EL2.VM == 0 because the translation regimes are different, but keeps the same > translation tables. Interesting. Is that because vEL2 really has ASIDs so it needs its own VMID space? >>> (the "Non-secure EL2 translation regime" as ARM DDI 0487F.b calls it on page D5-2543). >>> I think that's >>> the only situation where vEL2 and vEL1&0 will use the same L0 stage 2 tables. It's >>> been quite some time since I reviewed the initial version of the NV patches, did I >>> get that wrong? >> >>>> + */ >>>> + pgd_t *pgd; >>>> + phys_addr_t pgd_phys; >>>> >>>> /* The last vcpu id that ran on each physical CPU */ >>>> int __percpu *last_vcpu_ran; >>> It makes sense for the other fields to be part of kvm_s2_mmu, but I'm struggling >>> to figure out why last_vcpu_ran is here. Would you mind sharing the rationale? I >>> don't see this change in v1 or v2 of the NV series. >> Marc may have a better rationale. My thinking was because kvm_vmid is in here too. >> >> last_vcpu_ran exists to prevent KVM accidentally emulating CNP without the opt-in. (we >> call it defacto CNP). >> >> The guest may expect to be able to use asid-4 with different page tables on different > I'm afraid I don't know what asid-4 is. Sorry - 4 was just a random number![0] 'to use the same asid number on different vcpus'. >> vCPUs, assuming the TLB isn't shared. But if KVM is switching between those vCPU on one >> physical CPU, the TLB is shared, ... the VMID and ASID are the same, but the page tables >> are not. Not fun to debug! >> >> >> NV makes this problem per-stage2, because each stage2 has its own VMID, we need to track >> the vcpu_id that last ran this stage2 on this physical CPU. If its not the same, we need >> to blow away this VMIDs TLB entries. >> >> The workaround lives in virt/kvm/arm/arm.c::kvm_arch_vcpu_load() > > Makes sense, thank you for explaining that. Great, Thanks, James [0] https://xkcd.com/221/