On Thu, Aug 27, 2020 at 05:27:13PM +0100, Alexandru Elisei wrote: > It looks to me like the fact that code doesn't take into account the fact that we > can have concatenated pages at the initial level of lookup. Am I missing > something? Is it added in later patches and I missed it? I've commented below in a > few places where I noticed that. (seems like you figured some of this out in a later reply). > On 8/25/20 10:39 AM, Will Deacon wrote: > > The KVM page-table code is intricately tied into the kernel page-table > > code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt > > to reduce code duplication. Unfortunately, the reality is that there is > > an awful lot of code required to make this work, and at the end of the > > day you're limited to creating page-tables with the same configuration > > as the host kernel. Furthermore, lifting the page-table code to run > > directly at EL2 on a non-VHE system (as we plan to to do in future > > patches) is practically impossible due to the number of dependencies it > > has on the core kernel. > > > > Introduce a framework for walking Armv8 page-tables configured > > independently from the host kernel. > > > > Cc: Marc Zyngier <maz@xxxxxxxxxx> > > Cc: Quentin Perret <qperret@xxxxxxxxxx> > > Signed-off-by: Will Deacon <will@xxxxxxxxxx> > > --- > > arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++ > > arch/arm64/kvm/hyp/Makefile | 2 +- > > arch/arm64/kvm/hyp/pgtable.c | 290 +++++++++++++++++++++++++++ > > 3 files changed, 392 insertions(+), 1 deletion(-) > > create mode 100644 arch/arm64/include/asm/kvm_pgtable.h > > create mode 100644 arch/arm64/kvm/hyp/pgtable.c [...] > > +static u64 kvm_granule_shift(u32 level) > > +{ > > + return (KVM_PGTABLE_MAX_LEVELS - level) * (PAGE_SHIFT - 3) + 3; > > Isn't that the same same thing as the macro ARM64_HW_PGTABLE_LEVEL_SHIFT(n) from > pgtable-hwdef.h? I think the header is already included, as this file uses > PTRS_PER_PTE and that's the only place I found it defined. Hmm, that's an interesting one. If we ever want to adjust KVM_PGTABLE_MAX_LEVELS things will break, so we just need to take that into account should future architecture extensions add an extra level. I suppose I can add a comment to that effect and use ARM64_HW_PGTABLE_LEVEL_SHIFT() instead. > > > +} > > + > > +static u64 kvm_granule_size(u32 level) > > +{ > > + return BIT(kvm_granule_shift(level)); > > +} > > + > > +static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level) > > +{ > > + u64 granule = kvm_granule_size(level); > > + > > + /* > > + * Reject invalid block mappings and don't bother with 4TB mappings for > > + * 52-bit PAs. > > + */ > > + if (level == 0 || (PAGE_SIZE != SZ_4K && level == 1)) > > + return false; > > + > > + if (granule > (end - addr)) > > + return false; > > + > > + return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule); > > +} > > This is a very nice rewrite of fault_supports_stage2_huge_mapping, definitely > easier to understand. Thanks! > > +static u32 kvm_start_level(u64 ia_bits) > > +{ > > + u64 levels = DIV_ROUND_UP(ia_bits - PAGE_SHIFT, PAGE_SHIFT - 3); > > Isn't that the same same thing as the macro ARM64_HW_PGTABLE_LEVELS from > pgtable-hwdef.h? Yes, although this is slightly more idiomatic due to its use of DIV_ROUND_UP imo. But happy to replace it. > > > + return KVM_PGTABLE_MAX_LEVELS - levels; > > I tried to verify this formula and I think there's something that I don't > understand or I'm missing. For the default KVM setup, where the user doesn't > specify an IPA size different from the 40 bits default: ia_bits = 40 (IPA = > [39:0]), 4KB pages, translation starting at level 1 with 2 concatenated level 1 > tables (VTCR_EL2.T0SZ = 24, VTCR_EL2.SL0 = 1, VTCR_EL2.TG0 = 0, starting level > from table D5-13 at page D5-2566, ARM DDI 0487F.b), according to the formula I get: > > levels = DIV_ROUND_UP(40 - 12, 12 -3) = DIV_ROUND_UP(28, 9) = 4 > return 4 - 4 = 0 > > which means the resulting starting level is 0 instead of 1. Yeah, this is fiddly. kvm_start_level() doesn't cater for concatenation at all and it's only used to determine the start level for the hypervisor stage-1 table. For the stage-2 page-tables, we actually extract the start level back out of the vtcr, as that gets configured separately and so we just parameterise ourselves around that. I think I'll remove kvm_start_level() entirely, and just inlined it into its single call site (which will be neater using ARM64_HW_PGTABLE_LEVELS). > > > +} > > + > > +static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level) > > +{ > > + u64 shift = kvm_granule_shift(level); > > + u64 mask = BIT(PAGE_SHIFT - 3) - 1; > > This doesn't seem to take into account the fact that we can have concatenated > initial page tables. This is ok, as we basically process the PGD one page at a time so that the details of concatenation only really need to be exposed to the iterator. See the use of kvm_pgd_page_idx() in _kvm_pgtable_walk(). > > +static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data, > > + kvm_pte_t *ptep, u32 level) > > +{ > > + int ret = 0; > > + u64 addr = data->addr; > > + kvm_pte_t *childp, pte = *ptep; > > + bool table = kvm_pte_table(pte, level); > > + enum kvm_pgtable_walk_flags flags = data->walker->flags; > > + > > + if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) { > > + ret = kvm_pgtable_visitor_cb(data, addr, level, ptep, > > + KVM_PGTABLE_WALK_TABLE_PRE); > > I see that below we check if the visitor modified the leaf entry and turned into a > table. Is it not allowed for a visitor to turn a table into a block mapping? It is allowed, but in that case we don't revisit the block entry, as there's really no need. Compare that with installing a table, where you may well want to descend into the new table to initialise the new entries in there. The kerneldoc for kvm_pgtable_walk() talks a bit about this. (aside: that function isn't actually used, but it felt useful to expose it as an interface). Thanks for the review, Will _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm