On Thu, Jul 22, 2021 at 06:45:14AM +0000, Shameerali Kolothum Thodi wrote: > > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c > > b/arch/arm64/kvm/hyp/nvhe/tlb.c > > > index 83dc3b271bc5..42df9931ed9a 100644 > > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c > > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c > > > @@ -140,10 +140,10 @@ void __kvm_flush_cpu_context(struct > > kvm_s2_mmu *mmu) > > > __tlb_switch_to_host(&cxt); > > > } > > > > > > -void __kvm_flush_vm_context(void) > > > +void __kvm_tlb_flush_local_all(void) > > > { > > > - dsb(ishst); > > > - __tlbi(alle1is); > > > + dsb(nshst); > > > + __tlbi(alle1); > > > > > > /* > > > * VIPT and PIPT caches are not affected by VMID, so no maintenance > > > @@ -155,7 +155,7 @@ void __kvm_flush_vm_context(void) > > > * > > > */ > > > if (icache_is_vpipt()) > > > - asm volatile("ic ialluis"); > > > + asm volatile("ic iallu" : : ); > > > > > > - dsb(ish); > > > + dsb(nsh); > > > > Hmm, I'm wondering whether having this local stuff really makes sense for > > VMIDs. For ASIDs, where rollover can be frequent and TLBI could result in > > IPI on 32-bit, the local option was important, but here rollover is less > > frequent, DVM is relied upon to work and the cost of a hypercall is > > significant with nVHE. > > > > So I do think you could simplify patch 2 slightly to drop the > > flush_pending and just issue inner-shareable invalidation on rollover. > > With that, it might also make it straightforward to clear active_asids > > when scheduling out a vCPU, which would solve the other problem I > > mentioned > > about unnecessarily reserving a bunch of the VMID space. > > Ok. I will try out the above suggestion. Hope it will be acceptable for 8 bit > VMID systems as well as there is a higher chance for rollover especially > when we introduce pinned VMIDs(I am not sure such platforms care about > Pinned VMID or not. If not, we could limit Pinned VMIDs to 16 bit systems). So I woke up at 3am in a cold sweat after dreaming about this code. I think my suggestion above still stands for the VMID allocator, but interestingly, it would _not_ be valid for the ASID allocator because there the ASID is part of the active context and so, during the window where the active_asid is out of sync with the TTBR, receiving a broadcast TLBI from a concurrent rollover wouldn't be enough to knock out the old ASID from the TLB despite it subsequently being made available for reallocation. So the local TLB invalidation is not just a performance hint as I said; it's crucial to the way the thing works (and this is also why the CnP code has to switch to the reserved TTBR0). As an aside: I'm more and more inclined to rip out the CnP stuff given that it doesn't appear to being any benefits, but does have some clear downsides. Perhaps something for next week. Will _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm