Re: [PATCH 04/14] KVM: PPC: e500: MMU API

Avi Kivity <avi@xxxxxxxxxx> · Mon, 31 Oct 2011 15:24:20 +0200

On 10/31/2011 09:53 AM, Alexander Graf wrote:
> From: Scott Wood <scottwood@xxxxxxxxxxxxx>
>
> This implements a shared-memory API for giving host userspace access to
> the guest's TLB.
>
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 7945b0b..ab1136f 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1383,6 +1383,38 @@ The following flags are defined:
>  If datamatch flag is set, the event will be signaled only if the written value
>  to the registered address is equal to datamatch in struct kvm_ioeventfd.
>  
> +4.59 KVM_DIRTY_TLB
> +
> +Capability: KVM_CAP_SW_TLB
> +Architectures: ppc
> +Type: vcpu ioctl
> +Parameters: struct kvm_dirty_tlb (in)
> +Returns: 0 on success, -1 on error
> +
> +struct kvm_dirty_tlb {
> +	__u64 bitmap;
> +	__u32 num_dirty;
> +};

This is not 32/64 bit safe.  e500 is 32-bit only, yes? but what if
someone wants to emulate an e500 on a ppc64?  maybe it's better to add
padding here.

Another alternative is to drop the num_dirty field (and let the kernel
compute it instead, shouldn't take long?), and have the third argument
to ioctl() reference the bitmap directly.

> +
> +This must be called whenever userspace has changed an entry in the shared
> +TLB, prior to calling KVM_RUN on the associated vcpu.
> +
> +The "bitmap" field is the userspace address of an array.  This array
> +consists of a number of bits, equal to the total number of TLB entries as
> +determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
> +nearest multiple of 64.
> +
> +Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
> +array.
> +
> +The array is little-endian: the bit 0 is the least significant bit of the
> +first byte, bit 8 is the least significant bit of the second byte, etc.
> +This avoids any complications with differing word sizes.

And people say little/big endian is just a matter of taste.

> +
> +The "num_dirty" field is a performance hint for KVM to determine whether it
> +should skip processing the bitmap and just invalidate everything.  It must
> +be set to the number of set bits in the bitmap.
> +
>  4.62 KVM_CREATE_SPAPR_TCE
>  
>  Capability: KVM_CAP_SPAPR_TCE
> @@ -1700,3 +1732,45 @@ HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
>  HTAB invisible to the guest.
>  
>  When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.
> +
> +6.3 KVM_CAP_SW_TLB
> +
> +Architectures: ppc
> +Parameters: args[0] is the address of a struct kvm_config_tlb
> +Returns: 0 on success; -1 on error
> +
> +struct kvm_config_tlb {
> +	__u64 params;
> +	__u64 array;
> +	__u32 mmu_type;
> +	__u32 array_len;
> +};

Would it not be simpler to use args[0-3] for this, instead of yet
another indirection?

> +
> +Configures the virtual CPU's TLB array, establishing a shared memory area
> +between userspace and KVM.  The "params" and "array" fields are userspace
> +addresses of mmu-type-specific data structures.  The "array_len" field is an
> +safety mechanism, and should be set to the size in bytes of the memory that
> +userspace has reserved for the array.  It must be at least the size dictated
> +by "mmu_type" and "params".
> +
> +While KVM_RUN is active, the shared region is under control of KVM.  Its
> +contents are undefined, and any modification by userspace results in
> +boundedly undefined behavior.
> +
> +On return from KVM_RUN, the shared region will reflect the current state of
> +the guest's TLB.  If userspace makes any changes, it must call KVM_DIRTY_TLB
> +to tell KVM which entries have been changed, prior to calling KVM_RUN again
> +on this vcpu.

We already have another mechanism for such shared memory,
mmap(vcpu_fd).  x86 uses it for the coalesced mmio region as well as the
traditional kvm_run area.  Please consider using it.

> +
> +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
> + - The "params" field is of type "struct kvm_book3e_206_tlb_params".
> + - The "array" field points to an array of type "struct
> +   kvm_book3e_206_tlb_entry".
> + - The array consists of all entries in the first TLB, followed by all
> +   entries in the second TLB.
> + - Within a TLB, entries are ordered first by increasing set number.  Within a
> +   set, entries are ordered by way (increasing ESEL).
> + - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
> +   where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
> + - The tsize field of mas1 shall be set to 4K on TLB0, even though the
> +   hardware ignores this value for TLB0.

Holy shit.

> @@ -95,6 +90,9 @@ struct kvmppc_vcpu_e500 {
>  	u32 tlb1cfg;
>  	u64 mcar;
>  
> +	struct page **shared_tlb_pages;
> +	int num_shared_tlb_pages;
> +

I missed the requirement that things be page aligned.

If you use mmap(vcpu_fd) this becomes simpler; you can use
get_free_pages() and have a single pointer.  You can also use vmap() on
this array (but get_free_pages() is faster).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html