Re: [PATCH 13/13] KVM: PPC: e500: MMU API

Alexander Graf <agraf@xxxxxxx> · Thu, 19 May 2011 19:37:05 +0200

On 19.05.2011, at 18:28, Scott Wood wrote:

> On Thu, 19 May 2011 16:10:48 +0200
> Alexander Graf <agraf@xxxxxxx> wrote:
> 
>> On 05/18/2011 01:42 AM, Scott Wood wrote:
>>> +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
>> 
>> Can we rename those KVM_MMU_BOOKE_206_NOHV and KVM_MMU_BOOKE_206_HV? 
>> Same for the function names. It's just easier to read and more accurate. 
>> Also I don't like how it says KVM_MMU_FSL :). It's really a booke interface.
> 
> That's more or less how I had it initially.  In our earlier discussion I
> changed it to FSL because of assumptions about TLB layout, such as the set
> hash:
> 
> http://www.spinics.net/lists/kvm-ppc/msg02641.html
> http://www.spinics.net/lists/kvm-ppc/msg02645.html

Ah, that discussion was about calling it book3e, and that name is too generic. If the interface could satisfy any book3e 2.06 MMU, then calling it BOOKE_206 would be just fine. If it can only handle e500, call it FSL or E500 - whichever fits you better. I'd prefer if the interface was clever enough to also be used by non-FSL 2.06 implementations, in case any of those follow, if it's not too much additional overhead.

> 
>>> + - The "params" field is of type "struct kvm_fsl_booke_tlb_params".
>> 
>> I suppose params is a pointer?
> 
> Yes.
> 
>>> + - The "array" field points to an array of type "struct
>>> +   kvm_fsl_booke_tlb_entry".
>>> + - The array consists of all entries in the first TLB, followed by all
>>> +   entries in the second TLB.
>>> + - Within TLB0 entries are ordered first by increasing set number.  Within a
>>> +   set, entries are ordered by way (increasing ESEL).
>>> +   The hash for determining set number is: (MAS2>>  12)&  (num_sets - 1)
>>> +   where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
>>> + - Within TLB1, entries are ordered by increasing ESEL.
>> 
>> Don't special-case it. TLB1 basically contains of only a single set. The 
>> MMU interface here shouldn't be too e500 specific :).
> 
> How do I define the hash for determining set number without being
> e500-specific?

Hm. Isn't the hash always the way you put it above? Maybe I misread the spec :). The only obvious thing it states there are the corner cases O_o.

> 
>>> diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
>>> index d2ca5ed..2419be2 100644
>>> --- a/arch/powerpc/include/asm/kvm.h
>>> +++ b/arch/powerpc/include/asm/kvm.h
>>> @@ -272,4 +272,39 @@ struct kvm_guest_debug_arch {
>>>  #define KVM_INTERRUPT_UNSET	-2U
>>>  #define KVM_INTERRUPT_SET_LEVEL	-3U
>>> 
>>> +struct kvm_fsl_booke_tlb_entry {
>>> +	__u32 mas8;
>>> +	__u32 mas1;
>>> +	__u64 mas2;
>>> +	__u64 mas7_3;
>> 
>> Is there a good reason to not include the other MAS registers as defined 
>> by the architecture? That'd make the interface easily extensible for the 
>> future and should only eat a few additional bytes, no?
> 
> Those registers aren't part of a TLB entry, they're used for identifying
> the slot or searching.

Ah, yeah :). Sorry for the fuss.

> 
>>> +struct kvm_fsl_booke_tlb_params {
>>> +	/*
>>> +	 * book3e defines 4 TLBs, but current FSL Book E chips implement
>>> +	 * only the first two.  The second two entries in tlb_sizes[]
>>> +	 * and tlb_ways[] are reserved and must be zero.
>> 
>> I don't think that part is necessary. It's really an implementation detail.
> 
> It is not an implementation detail.  KVM on FSL Book E does not support
> TLB2 or TLB3.  I had more generic wording when it was not FSL specific,
> where the supported TLBs were determined by reading the TLB config
> registers.

Either the struct is 100% e500 specific, then you only need 2 TLBs in there, or it's generic, then the actual implementation of TLB2 and TLB3 are open and as of yet unimplemented, but not reserved for eternity :).

> 
>>> +	 *
>>> +	 * A tlb_ways value of zero means the array is fully associative.
>>> +	 * Only TLB0 may be configured with a different associativity.  The
>>> +	 * number of ways of TLB0 must be a power of two between 2 and 16.
>>> +	 *
>>> +	 * The size of TLB0 must be a multiple of the number of ways, and
>>> +	 * the number of sets must be a power of two.
>>> +	 *
>>> +	 * The size of TLB1 may not exceed 64 entries.
>> 
>> Why not?
> 
> I wanted to set some limit, at least as long as we process TLB1 linearly
> rather than use some fancy data structure.  64 is the most we currently
> have in hardware, and I don't think we have any plans to increase that
> while maintaining full associativity.
> 
> If we need to change it later we can -- either with a capability, or by
> having qemu iterate to find the max.

Ok :).

> 
>>> +	 * KVM will adjust TLBnCFG based on the sizes configured here,
>>> +	 * though arrays greater than 2048 entries will have TLBnCFG[NENTRY]
>>> +	 * set to zero.
>>> +	 *
>>> +	 * TLB0 supports 4 KiB pages.
>>> +	 * The page sizes supported by TLB1 are as indicated by
>>> +	 * TLB1CFG (if MMUCFG[MAVN] = 0) or TLB1PS (if MMUCFG[MAVN] = 1).
>> 
>> This is not part of the interface, is it?
> 
> It provides a way for qemu to discover what page sizes KVM will deal with.

Hrm. So where does it get MMUCFG from?

> 
>>> @@ -956,51 +994,229 @@ void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid)
>>> 
>>>  void kvmppc_e500_tlb_setup(struct kvmppc_vcpu_e500 *vcpu_e500)
>>>  {
>>> -	struct tlbe *tlbe;
>>> +	struct kvm_fsl_booke_tlb_entry *tlbe;
>>> 
>>>  	/* Insert large initial mapping for guest. */
>>> -	tlbe =&vcpu_e500->gtlb_arch[1][0];
>>> +	tlbe = get_entry(vcpu_e500, 1, 0);
>>>  	tlbe->mas1 = MAS1_VALID | MAS1_TSIZE(BOOK3E_PAGESZ_256M);
>>>  	tlbe->mas2 = 0;
>>> -	tlbe->mas3 = E500_TLB_SUPER_PERM_MASK;
>>> -	tlbe->mas7 = 0;
>>> +	tlbe->mas7_3 = E500_TLB_SUPER_PERM_MASK;
>>> 
>>>  	/* 4K map for serial output. Used by kernel wrapper. */
>> 
>> Which kernel wrapper? At the end of the day, these initial TLB values 
>> should be pushed down from Qemu btw. They really don't belong here. A 
>> different guest (u-boot for example) might need completely different 
>> mappings.
> 
> That was not added by this patchset...  and yes, the plan is for qemu to
> set the inital TLB using the MMU interface.  The entries created here are
> just for legacy qemu.

It just came to mind when looking through the patch - of course the code needs to stay for old user space, but I wanted to make sure we're on the same page wrt where we're going with TLB inits :).

> 
>>> +
>>> +int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
>>> +			      struct kvm_config_tlb *cfg)
>>> +{
>>> +	struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>>> +	struct kvm_fsl_booke_tlb_params params;
>>> +	char *virt;
>>> +	struct page **pages;
>>> +	struct tlbe_priv *privs[2] = {};
>>> +	size_t array_len;
>>> +	u32 sets;
>>> +	int num_pages, ret, i;
>>> +
>>> +	if (cfg->mmu_type != KVM_MMU_FSL_BOOKE_NOHV)
>>> +		return -EINVAL;
>>> +
>>> +	if (copy_from_user(&params, (void __user *)(uintptr_t)cfg->params,
>>> +			   sizeof(params)))
>>> +		return -EFAULT;
>>> +
>>> +	if (params.tlb_sizes[1]>  64)
>>> +		return -EINVAL;
>>> +	if (params.tlb_sizes[2] != 0 || params.tlb_sizes[3] != 0)
>>> +		return -EINVAL;
>>> +	if (params.tlb_ways[1] != 0 || params.tlb_ways[2] != 0 |
>> 
>> Hrm - I always thought that basically the TLB ID would be:
>> 
>>   (some bits of the EA) || (ESEL)
>> 
>> So in the TLB1 case, "some bits" would mean "no bits" and ESEL is all of 
>> the information we get. And ESEL is ways, no?
> 
> TLBnCFG allows ASSOC to be either zero or equal to NENTRY to indicate
> full associativity.  In this API I chose just one way of representing it
> for simplicity.  I can pick ASSOC = NENTRY instead if you prefer.

Ah, ok. I'd find it easier to understand if we just always stick to (EA) || (ESEL) :).

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html