Re: [PATCH v3 08/32] arm64: KVM: architecture specific MMU backend

Marc Zyngier <marc.zyngier@xxxxxxx> · Wed, 24 Apr 2013 12:03:10 +0100

On 23/04/13 23:58, Christoffer Dall wrote:
> On Mon, Apr 08, 2013 at 05:17:10PM +0100, Marc Zyngier wrote:
>> Define the arm64 specific MMU backend:
>> - HYP/kernel VA offset
>> - S2 4/64kB definitions
>> - S2 page table populating and flushing
>> - icache cleaning
>>
>> Reviewed-by: Christopher Covington <cov@xxxxxxxxxxxxxx>
>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
>> ---
>>  arch/arm64/include/asm/kvm_mmu.h | 136 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 136 insertions(+)
>>  create mode 100644 arch/arm64/include/asm/kvm_mmu.h
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> new file mode 100644
>> index 0000000..2eb2230
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -0,0 +1,136 @@
>> +/*
>> + * Copyright (C) 2012,2013 - ARM Ltd
>> + * Author: Marc Zyngier <marc.zyngier@xxxxxxx>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef __ARM64_KVM_MMU_H__
>> +#define __ARM64_KVM_MMU_H__
>> +
>> +#include <asm/page.h>
>> +#include <asm/memory.h>
>> +
>> +/*
>> + * As we only have the TTBR0_EL2 register, we cannot express
>> + * "negative" addresses. This makes it impossible to directly share
>> + * mappings with the kernel.
>> + *
>> + * Instead, give the HYP mode its own VA region at a fixed offset from
>> + * the kernel by just masking the top bits (which are all ones for a
>> + * kernel address).
> 
> For some reason I keep choking on this, despite it being very simple.
> We're just defining a different PAGE_OFFSET, right? Why not do a hard
> define as:
> 
> #define HYP_PAGE_OFFSET_MASK	0x0000ffffffffffff
> #define HYP_PAGE_OFFSET		0x0000ffc000000000
> 
> ...or change the second paragraph of the comment to say
> that we definethe HYP_PAGE_OFFSET to be 0x 0000ffc0 00000000.

One of these days, VA_BITS will change to accommodate for more virtual
space. When that day comes, I don't want to touch any of this because it
did hurt enough when writing it. As such, I'll refrain from hardcoding
anything.

I don't mind a comment, though.

>> + */
>> +#define HYP_PAGE_OFFSET_SHIFT	VA_BITS
>> +#define HYP_PAGE_OFFSET_MASK	((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1)
> 
> In any case, is there a reason for the HYP_PAGE_OFFSET_SHIFT
> indirection? It may be simpler without...

It is common practice to have XXX_SHIFT and XXX_MASK together.

>> +#define HYP_PAGE_OFFSET		(PAGE_OFFSET & HYP_PAGE_OFFSET_MASK)
>> +
>> +/*
>> + * Our virtual mapping for the idmap-ed MMU-enable code. Must be
>> + * shared across all the page-tables. Conveniently, we use the last
>> + * possible page, where no kernel mapping will ever exist.
>> + */
>> +#define TRAMPOLINE_VA		(HYP_PAGE_OFFSET_MASK & PAGE_MASK)
> 
> hmmm, ok, here it's kind of nice to have that define correlation, so
> maybe it's not cleaner.  Something should be improved here in the define
> or the comment to make it more clear.  Perhaps just adding the real
> constants in the comment or in Documentation/arm64/memory.txt would
> help.

Yes, I plan to write something there.

>> +
>> +#ifdef __ASSEMBLY__
>> +
>> +/*
>> + * Convert a kernel VA into a HYP VA.
>> + * reg: VA to be converted.
>> + */
>> +.macro kern_hyp_va	reg
>> +	and	\reg, \reg, #HYP_PAGE_OFFSET_MASK
>> +.endm
>> +
>> +#else
>> +
>> +#include <asm/cacheflush.h>
>> +
>> +#define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
>> +
>> +/*
>> + * Align KVM with the kernel's view of physical memory. Should be
>> + * 40bit IPA, with PGD being 8kB aligned.
>> + */
>> +#define KVM_PHYS_SHIFT	PHYS_MASK_SHIFT
>> +#define KVM_PHYS_SIZE	(1UL << KVM_PHYS_SHIFT)
>> +#define KVM_PHYS_MASK	(KVM_PHYS_SIZE - 1UL)
>> +
>> +#ifdef CONFIG_ARM64_64K_PAGES
>> +#define PAGE_LEVELS	2
>> +#define BITS_PER_LEVEL	13
>> +#else  /* 4kB pages */
>> +#define PAGE_LEVELS	3
>> +#define BITS_PER_LEVEL	9
>> +#endif
> 
> What are the semantics of these defines exactly? They should be
> S2_PAGE_LEVELS and make some assumptions of the VTCR_EL2.SL0 field
> right?

Indeed, we assume SL0 is always 1, just like for the kernel. As for the
semantics, I though they were pretty obvious...

PAGE_LEVELS is just that, the number of page levels. BITS_PER_LEVEL is
the number of bits you need to index a particular level.

>> +
>> +/* Make sure we get the right size, and thus the right alignment */
> 
> this comment is not particularly helpful, something explaining why the
> max() thingy below makes sense would be more so :)  I really can't
> follow the BITS_PER_S2_PGD and PTRS_PER_S2_PGD defines.

Admittedly, they are a bit convoluted:

>> +#define BITS_PER_S2_PGD (KVM_PHYS_SHIFT - (PAGE_LEVELS - 1) * BITS_PER_LEVEL - PAGE_SHIFT)

This is how many bits of index you need in a PGD. With a 40bit IPA and a
4kB page size, you end up with a 10 bit index.

>> +#define PTRS_PER_S2_PGD (1 << max(BITS_PER_LEVEL, BITS_PER_S2_PGD))

Now, in order to allocate your PGD, you need to find out how many
pointers your PGD is going to contain. The max() is make sure we
allocate at least a full page worth of pointers.

>> +#define S2_PGD_ORDER	get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
>> +#define S2_PGD_SIZE	(1 << S2_PGD_ORDER)
>> +
>> +int create_hyp_mappings(void *from, void *to);
>> +int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
>> +void free_hyp_pgds(void);
>> +
>> +int kvm_alloc_stage2_pgd(struct kvm *kvm);
>> +void kvm_free_stage2_pgd(struct kvm *kvm);
>> +int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>> +			  phys_addr_t pa, unsigned long size);
>> +
>> +int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
>> +
>> +void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
>> +
>> +phys_addr_t kvm_mmu_get_httbr(void);
>> +phys_addr_t kvm_mmu_get_boot_httbr(void);
>> +phys_addr_t kvm_get_idmap_vector(void);
>> +int kvm_mmu_init(void);
>> +void kvm_clear_hyp_idmap(void);
>> +
>> +#define	kvm_set_pte(ptep, pte)		set_pte(ptep, pte)
> 
> I noticed that this doesn't do any cache cleaning.  Are the MMU page
> table walks guaranteed to be coherent with the MMU on arm64?

I suppose you meant the cache. In this case, yes. The hardware page
table walker must snoop the caches.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html