Hi Gavin, On Wed, Sep 02, 2020 at 04:31:32PM +1000, Gavin Shan wrote: > On 8/25/20 7:39 PM, Will Deacon wrote: > > The KVM page-table code is intricately tied into the kernel page-table > > code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt > > to reduce code duplication. Unfortunately, the reality is that there is > > an awful lot of code required to make this work, and at the end of the > > day you're limited to creating page-tables with the same configuration > > as the host kernel. Furthermore, lifting the page-table code to run > > directly at EL2 on a non-VHE system (as we plan to to do in future > > patches) is practically impossible due to the number of dependencies it > > has on the core kernel. > > > > Introduce a framework for walking Armv8 page-tables configured > > independently from the host kernel. > > > > Cc: Marc Zyngier <maz@xxxxxxxxxx> > > Cc: Quentin Perret <qperret@xxxxxxxxxx> > > Signed-off-by: Will Deacon <will@xxxxxxxxxx> > > --- > > arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++ > > arch/arm64/kvm/hyp/Makefile | 2 +- > > arch/arm64/kvm/hyp/pgtable.c | 290 +++++++++++++++++++++++++++ > > 3 files changed, 392 insertions(+), 1 deletion(-) > > create mode 100644 arch/arm64/include/asm/kvm_pgtable.h > > create mode 100644 arch/arm64/kvm/hyp/pgtable.c [...] > > +struct kvm_pgtable_walk_data { > > + struct kvm_pgtable *pgt; > > + struct kvm_pgtable_walker *walker; > > + > > + u64 addr; > > + u64 end; > > +}; > > + > > Some of the following function might be worthy to be inlined, considering > their complexity :) I'll leave that for the compiler to figure out :) > > +static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level) > > +{ > > + struct kvm_pgtable pgt = { > > + .ia_bits = ia_bits, > > + .start_level = start_level, > > + }; > > + > > + return __kvm_pgd_page_idx(&pgt, -1ULL) + 1; > > +} > > + > > It seems @pgt.start_level is assigned with wrong value here. > For example, @start_level is 2 when @ia_bits and PAGE_SIZE > are 40 and 64KB separately. In this case, __kvm_pgd_page_idx() > always return zero. However, the extra page covers up the > issue. I think something like below might be needed: > > struct kvm_pgtable pgt = { > .ia_bits = ia_bits, > .start_level = KVM_PGTABLE_MAX_LEVELS - start_level + 1, > }; Hmm, we're pulling the start_level right out of the vtcr, so I don't see how it can be wrong. In your example, a start_level of 2 seems correct to me, as we'll translate 13 bits there, then 13 bits at level 3 which covers the 24 bits you need (with a 16-bit offset within the page). Your suggestion would give us a start_level of 1, which has a redundant level of translation. Maybe you're looking at the levels upside-down? The top level is level 0 and each time you walk to a new level, that number increases. But perhaps I'm missing something. Please could you elaborate if you think there's a problem here? > > +static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data) > > +{ > > + u32 idx; > > + int ret = 0; > > + struct kvm_pgtable *pgt = data->pgt; > > + u64 limit = BIT(pgt->ia_bits); > > + > > + if (data->addr > limit || data->end > limit) > > + return -ERANGE; > > + > > + if (!pgt->pgd) > > + return -EINVAL; > > + > > + for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) { > > + kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE]; > > + > > + ret = __kvm_pgtable_walk(data, ptep, pgt->start_level); > > + if (ret) > > + break; > > + } > > + > > + return ret; > > +} > > + > > I guess we need bail on the following condition: > > if (data->addr >= limit || data->end >= limit) > return -ERANGE; What's wrong with the existing check? In particular, I think we _want_ to support data->end == limit (it's exclusive). If data->addr == limit, then we'll have a size of zero and the loop won't run. Will _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm