Re: [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure

Will Deacon <will@xxxxxxxxxx> · Wed, 2 Sep 2020 11:36:49 +0100

On Thu, Aug 27, 2020 at 05:27:13PM +0100, Alexandru Elisei wrote:
> It looks to me like the fact that code doesn't take into account the fact that we
> can have concatenated pages at the initial level of lookup. Am I missing
> something? Is it added in later patches and I missed it? I've commented below in a
> few places where I noticed that.

(seems like you figured some of this out in a later reply).

> On 8/25/20 10:39 AM, Will Deacon wrote:
> > The KVM page-table code is intricately tied into the kernel page-table
> > code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
> > to reduce code duplication. Unfortunately, the reality is that there is
> > an awful lot of code required to make this work, and at the end of the
> > day you're limited to creating page-tables with the same configuration
> > as the host kernel. Furthermore, lifting the page-table code to run
> > directly at EL2 on a non-VHE system (as we plan to to do in future
> > patches) is practically impossible due to the number of dependencies it
> > has on the core kernel.
> >
> > Introduce a framework for walking Armv8 page-tables configured
> > independently from the host kernel.
> >
> > Cc: Marc Zyngier <maz@xxxxxxxxxx>
> > Cc: Quentin Perret <qperret@xxxxxxxxxx>
> > Signed-off-by: Will Deacon <will@xxxxxxxxxx>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++
> >  arch/arm64/kvm/hyp/Makefile          |   2 +-
> >  arch/arm64/kvm/hyp/pgtable.c         | 290 +++++++++++++++++++++++++++
> >  3 files changed, 392 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
> >  create mode 100644 arch/arm64/kvm/hyp/pgtable.c

[...]

> > +static u64 kvm_granule_shift(u32 level)
> > +{
> > +	return (KVM_PGTABLE_MAX_LEVELS - level) * (PAGE_SHIFT - 3) + 3;
> 
> Isn't that the same same thing as the macro ARM64_HW_PGTABLE_LEVEL_SHIFT(n) from
> pgtable-hwdef.h? I think the header is already included, as this file uses
> PTRS_PER_PTE and that's the only place I found it defined.

Hmm, that's an interesting one. If we ever want to adjust KVM_PGTABLE_MAX_LEVELS
things will break, so we just need to take that into account should future
architecture extensions add an extra level. I suppose I can add a comment
to that effect and use ARM64_HW_PGTABLE_LEVEL_SHIFT() instead.

> 
> > +}
> > +
> > +static u64 kvm_granule_size(u32 level)
> > +{
> > +	return BIT(kvm_granule_shift(level));
> > +}
> > +
> > +static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> > +{
> > +	u64 granule = kvm_granule_size(level);
> > +
> > +	/*
> > +	 * Reject invalid block mappings and don't bother with 4TB mappings for
> > +	 * 52-bit PAs.
> > +	 */
> > +	if (level == 0 || (PAGE_SIZE != SZ_4K && level == 1))
> > +		return false;
> > +
> > +	if (granule > (end - addr))
> > +		return false;
> > +
> > +	return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule);
> > +}
> 
> This is a very nice rewrite of fault_supports_stage2_huge_mapping, definitely
> easier to understand.

Thanks!

> > +static u32 kvm_start_level(u64 ia_bits)
> > +{
> > +	u64 levels = DIV_ROUND_UP(ia_bits - PAGE_SHIFT, PAGE_SHIFT - 3);
> 
> Isn't that the same same thing as the macro ARM64_HW_PGTABLE_LEVELS from
> pgtable-hwdef.h?

Yes, although this is slightly more idiomatic due to its use of
DIV_ROUND_UP imo. But happy to replace it.

> 
> > +	return KVM_PGTABLE_MAX_LEVELS - levels;
> 
> I tried to verify this formula and I think there's something that I don't
> understand or I'm missing. For the default KVM setup, where the user doesn't
> specify an IPA size different from the 40 bits default: ia_bits = 40 (IPA =
> [39:0]), 4KB pages, translation starting at level 1 with 2 concatenated level 1
> tables (VTCR_EL2.T0SZ = 24, VTCR_EL2.SL0 = 1, VTCR_EL2.TG0 = 0, starting level
> from table D5-13 at page D5-2566, ARM DDI 0487F.b), according to the formula I get:
> 
> levels = DIV_ROUND_UP(40 - 12, 12 -3) = DIV_ROUND_UP(28, 9) = 4
> return 4 - 4 = 0
> 
> which means the resulting starting level is 0 instead of 1.

Yeah, this is fiddly. kvm_start_level() doesn't cater for concatenation at
all and it's only used to determine the start level for the hypervisor
stage-1 table. For the stage-2 page-tables, we actually extract the start
level back out of the vtcr, as that gets configured separately and so we
just parameterise ourselves around that.

I think I'll remove kvm_start_level() entirely, and just inlined it into
its single call site (which will be neater using ARM64_HW_PGTABLE_LEVELS).

> 
> > +}
> > +
> > +static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> > +{
> > +	u64 shift = kvm_granule_shift(level);
> > +	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
> 
> This doesn't seem to take into account the fact that we can have concatenated
> initial page tables.

This is ok, as we basically process the PGD one page at a time so that the
details of concatenation only really need to be exposed to the iterator.
See the use of kvm_pgd_page_idx() in _kvm_pgtable_walk().

> > +static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> > +				      kvm_pte_t *ptep, u32 level)
> > +{
> > +	int ret = 0;
> > +	u64 addr = data->addr;
> > +	kvm_pte_t *childp, pte = *ptep;
> > +	bool table = kvm_pte_table(pte, level);
> > +	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> > +
> > +	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> > +		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> > +					     KVM_PGTABLE_WALK_TABLE_PRE);
> 
> I see that below we check if the visitor modified the leaf entry and turned into a
> table. Is it not allowed for a visitor to turn a table into a block mapping?

It is allowed, but in that case we don't revisit the block entry, as there's
really no need. Compare that with installing a table, where you may well
want to descend into the new table to initialise the new entries in there.

The kerneldoc for kvm_pgtable_walk() talks a bit about this. (aside: that
function isn't actually used, but it felt useful to expose it as an
interface).

Thanks for the review,

Will
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm