On Wed, Mar 29, 2023 at 5:42 PM Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > On Mon, Feb 06, 2023 at 05:23:40PM +0000, Raghavendra Rao Ananta wrote: > > The current implementation of the stage-2 unmap walker > > traverses the entire page-table to clear and flush the TLBs > > for each entry. This could be very expensive, especially if > > the VM is not backed by hugepages. The unmap operation could be > > made efficient by disconnecting the table at the very > > top (level at which the largest block mapping can be hosted) > > and do the rest of the unmapping using free_removed_table(). > > If the system supports FEAT_TLBIRANGE, flush the entire range > > that has been disconnected from the rest of the page-table. > > > > Suggested-by: Ricardo Koller <ricarkol@xxxxxxxxxx> > > Signed-off-by: Raghavendra Rao Ananta <rananta@xxxxxxxxxx> > > --- > > arch/arm64/kvm/hyp/pgtable.c | 44 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 44 insertions(+) > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > index 0858d1fa85d6b..af3729d0971f2 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -1017,6 +1017,49 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, > > return 0; > > } > > > > +/* > > + * The fast walker executes only if the unmap size is exactly equal to the > > + * largest block mapping supported (i.e. at KVM_PGTABLE_MIN_BLOCK_LEVEL), > > + * such that the underneath hierarchy at KVM_PGTABLE_MIN_BLOCK_LEVEL can > > + * be disconnected from the rest of the page-table without the need to > > + * traverse all the PTEs, at all the levels, and unmap each and every one > > + * of them. The disconnected table is freed using free_removed_table(). > > + */ > > +static int fast_stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx, > > + enum kvm_pgtable_walk_flags visit) > > +{ > > + struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops; > > + kvm_pte_t *childp = kvm_pte_follow(ctx->old, mm_ops); > > + struct kvm_s2_mmu *mmu = ctx->arg; > > + > > + if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_MIN_BLOCK_LEVEL) > > + return 0; > > + > > + if (!stage2_try_break_pte(ctx, mmu)) > > + return -EAGAIN; > > + > > + /* > > + * Gain back a reference for stage2_unmap_walker() to free > > + * this table entry from KVM_PGTABLE_MIN_BLOCK_LEVEL - 1. > > + */ > > + mm_ops->get_page(ctx->ptep); > > Doesn't this run the risk of a potential UAF if the refcount was 1 before > calling stage2_try_break_pte()? IOW, stage2_try_break_pte() will drop > the refcount to 0 on the page before this ever gets called. > > Also, AFAICT this misses the CMOs that are required on systems w/o > FEAT_FWB. Without them it is possible that the host will read something > other than what was most recently written by the guest if it is using > noncacheable memory attributes at stage-1. > > I imagine the actual bottleneck is the DSB required after every > CMO/TLBI. Theoretically, the unmap path could be updated to: > > - Perform the appropriate CMOs for every valid leaf entry *without* > issuing a DSB. > > - Elide TLBIs entirely that take place in the middle of the walk > > - After the walk completes, dsb(ish) to guarantee that the CMOs have > completed and the invalid PTEs are made visible to the hardware > walkers. This should be done implicitly by the TLBI implementation > > - Invalidate the [addr, addr + size) range of IPAs > > This would also avoid over-invalidating stage-1 since we blast the > entire stage-1 context for every stage-2 invalidation. Thoughts? > Correct me if I'm wrong, but if we invalidate the TLB after the walk is complete, don't you think there's a risk of race if the guest can hit in the TLB even though the page was unmapped? Thanks, Raghavendra Raghavendra > > + mm_ops->free_removed_table(childp, ctx->level); > > + return 0; > > +} > > + > > -- > Thanks, > Oliver