On Fri, Oct 07, 2022, Oliver Upton wrote: > Use an opaque type for pteps and require visitors explicitly dereference > the pointer before using. Protecting page table memory with RCU requires > that KVM dereferences RCU-annotated pointers before using. However, RCU > is not available for use in the nVHE hypervisor and the opaque type can > be conditionally annotated with RCU for the stage-2 MMU. > > Call the type a 'pteref' to avoid a naming collision with raw pteps. No > functional change intended. > > Signed-off-by: Oliver Upton <oliver.upton@xxxxxxxxx> > --- > arch/arm64/include/asm/kvm_pgtable.h | 9 ++++++++- > arch/arm64/kvm/hyp/pgtable.c | 23 ++++++++++++----------- > 2 files changed, 20 insertions(+), 12 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h > index c33edcf36b5b..beb89eac155c 100644 > --- a/arch/arm64/include/asm/kvm_pgtable.h > +++ b/arch/arm64/include/asm/kvm_pgtable.h > @@ -25,6 +25,13 @@ static inline u64 kvm_get_parange(u64 mmfr0) > > typedef u64 kvm_pte_t; > > +typedef kvm_pte_t *kvm_pteref_t; > + > +static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared) > +{ > + return pteref; Returning the pointer is unsafe (when it becomes RCU-protected). The full dereference of the data needs to occur under RCU protection, not just the retrieval of the pointer. E.g. this (straw man) would be broken bool table = kvm_pte_table(ctx.old, level); rcu_read_lock(); ptep = kvm_dereference_pteref(pteref, flags & KVM_PGTABLE_WALK_SHARED); rcu_read_unlock(); if (table && (ctx.flags & KVM_PGTABLE_WALK_TABLE_PRE)) ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_PRE); if (!table && (ctx.flags & KVM_PGTABLE_WALK_LEAF)) { ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_LEAF); ctx.old = READ_ONCE(*ptep); table = kvm_pte_table(ctx.old, level); } as the read of the entry pointed at by ptep could be to a page table that is freed in an RCU callback. The naming collision you are trying to avoid is a symptom of this bad pattern, as there should never be "raw" pteps floating around, at least not in non-pKVM contexts that utilize RCU. > +} > + > #define KVM_PTE_VALID BIT(0) > > #define KVM_PTE_ADDR_MASK GENMASK(47, PAGE_SHIFT) > @@ -170,7 +177,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end, > struct kvm_pgtable { > u32 ia_bits; > u32 start_level; > - kvm_pte_t *pgd; > + kvm_pteref_t pgd; > struct kvm_pgtable_mm_ops *mm_ops; > > /* Stage-2 only */ > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index 02c33fccb178..6b6e1ed7ee2f 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -175,13 +175,14 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, > } > > static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data, > - struct kvm_pgtable_mm_ops *mm_ops, kvm_pte_t *pgtable, u32 level); > + struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level); > > static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data, > struct kvm_pgtable_mm_ops *mm_ops, > - kvm_pte_t *ptep, u32 level) > + kvm_pteref_t pteref, u32 level) > { > enum kvm_pgtable_walk_flags flags = data->walker->flags; > + kvm_pte_t *ptep = kvm_dereference_pteref(pteref, false); > struct kvm_pgtable_visit_ctx ctx = { > .ptep = ptep, > .old = READ_ONCE(*ptep), This is where you want the protection to kick in, e.g. typedef kvm_pte_t __rcu *kvm_ptep_t; static inline kvm_pte_t kvm_read_pte(kvm_ptep_t ptep) { return READ_ONCE(*rcu_dereference(ptep)); } .old = kvm_read_pte(ptep), In other words, the pointer itself isn't that's protected, it's PTE that the pointer points at that's protected. rcu_dereference() has no overhead when CONFIG_PROVE_RCU=n, i.e. there's no reason to "optimize" dereferences.