Re: RFC: New API for PPC for vcpu mmu access

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 11 Feb 2011 22:07:11 +0100
Alexander Graf <agraf@xxxxxxx> wrote:

> 
> On 11.02.2011, at 21:53, Scott Wood wrote:
> 
> > On Fri, 11 Feb 2011 02:41:35 +0100
> > Alexander Graf <agraf@xxxxxxx> wrote:
> > 
> >>>> Maybe we should go with Avi's proposal after all and simply keep the full soft-mmu synced between kernel and user space? That way we only need a setup call at first, no copying in between and simply update the user space version whenever something changes in the guest. We need to store the TLB's contents off somewhere anyways, so all we need is an additional in-kernel array with internal translation data, but that can be separate from the guest visible data, right?
> > 
> > Hmm, the idea is growing on me.
> > 
> >> So then everything we need to get all the functionality we need is a hint from kernel to user space that something changed and vice versa.
> >> 
> >> From kernel to user space is simple. We can just document that after every RUN, all fields can be modified.
> >> From user space to kernel, we could modify the entries directly and then pass in an ioctl that passes in a dirty bitmap to kernel space. KVM can then decide what to do with it. I guess the easiest implementation for now would be to ignore the bitmap and simply flush the shadow tlb.
> >> 
> >> That gives us the flush almost for free. All we need to do is set the tlb to all zeros (should be done by env init anyways) and pass in the "something changed" call. KVM can then decide to simply drop all of its shadow state or loop through every shadow entry and flush it individually. Maybe we should give a hint on the amount of flushes, so KVM can implement some threshold.
> > 
> > OK.  We'll also need a config ioctl to specify MMU type/size and the address
> > of the arrays.
> 
> Right, a setup call basically :).

OK, here goes v3:

[Note: one final thought after writing this up -- this isn't going to work
too well in cases where the guest can directly manipulate its TLB, such as
with the LRAT feature of Power Arch 2.06.  We'll still need a
copy-in/copy-out mechanism for that.]

struct kvmppc_book3e_tlb_entry {
	union {
		__u64 mas8_1;
		struct {
			__u32 mas8;
			__u32 mas1;
		};
	};
	__u64 mas2;
	union {
		__u64 mas7_3	
		struct {
			__u32 mas7;
			__u32 mas3;
		};
	};
};

For an MMU type of KVM_MMU_PPC_BOOK3E_NOHV, the mas8 in kvmppc_book3e_tlb_entry is
present but not supported.

struct kvmppc_book3e_tlb_params {
	/*
	 * book3e defines 4 TLBs.  Individual implementations may have
	 * fewer.  TLBs that do not already exist on the target must be
	 * configured with a size of zero.  A tlb_ways value of zero means
	 * the array is fully associative.  Only TLBs that are already
	 * set-associative on the target may be configured with a different
	 * associativity.  A set-associative TLB may not exceed 255 ways.
	 *
	 * KVM will adjust TLBnCFG based on the sizes configured here,
	 * though arrays greater than 2048 entries will have TLBnCFG[NENTRY]
	 * set to zero.
	 *
	 * The size of any TLB that is set-associative must be a multiple of
	 * the number of ways, and the number of sets must be a power of two.
	 *
	 * The page sizes supported by a TLB shall be determined by reading
	 * the TLB configuration registers.  This is not adjustable by userspace.
	 * [Note: need sregs]
	 */
	__u32 tlb_sizes[4];
	__u8 tlb_ways[4];
	__u32 reserved[11];
};

KVM_CONFIG_TLB
--------------

Capability: KVM_CAP_SW_TLB
Type: vcpu ioctl
Parameters: struct kvm_config_tlb (in)
Returns: 0 on success
         -1 on error

struct kvm_config_tlb {
	__u64 params;
	__u64 array;
	__u32 mmu_type;
	__u32 array_len;
};

Configures the virtual CPU's TLB array, establishing a shared memory area
between userspace and KVM.  The "params" and "array" fields are userspace
addresses of mmu-type-specific data structures.  The "array_len" field is an
safety mechanism, and should be set to the size in bytes of the memory that
userspace has reserved for the array.  It must be at least the size dictated by
"mmu_type" and "params".

On return from this call, KVM assumes the state of the TLBs to be empty. 
Prior to calling KVM_RUN, userspace must call KVM_DIRTY_TLB to tell KVM about
any valid entries.

While KVM_RUN is active, the shared region is under control of KVM.  Its
contents are undefined, and any modification by userspace results in boundedly
undefined behavior.

On return from KVM_RUN, the shared region will reflect the current state of
the guest's TLB.  If userspace makes any changes, it must call KVM_DIRTY_TLB
to tell KVM which entries have been changed, prior to calling KVM_RUN again
on this vcpu.

For mmu types KVM_MMU_PPC_BOOK3E_NOHV and KVM_MMU_PPC_BOOK3E_HV:
 - The "params" field is of type "struct kvmppc_book3e_tlb_params".
 - The "array" field points to an array of type "struct kvmppc_book3e_tlb_entry".
 - The array consists of all entries in the first TLB, followed by all
   entries in the second TLB, etc.
 - Within a TLB, if the array is not set-associative, entries are ordered by
   increasing ESEL.  If the array is set-associative, entries are ordered first
   by set.  Within a set, entries are ordered by way (ESEL).
 - The hash for determining set number is:
     (MAS2[EPN] / page_size) & (num_sets - 1)
   where "page_size" is the smallest page size supported by the TLB, and
   "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
   If a book3e chip is made for which a different hash is needed, a new
   MMU type must be used, to ensure that userspace and KVM agree on layout.

KVM_DIRTY_TLB
-------------

Capability: KVM_CAP_SW_TLB
Type: vcpu ioctl
Parameters: struct kvm_dirty_tlb (in)
Returns: 0 on success
         -1 on error

struct kvm_dirty_tlb {
	__u64 bitmap;
	__u32 num_dirty;
};

This must be called whenever userspace has changed an entry in the shared
TLB, prior to calling KVM_RUN on the associated vcpu.

The "bitmap" field is the userspace address of an array.  This array consists
of a number of bits, equal to the total number of TLB entries as determined by the
last successful call to KVM_CONFIG_TLB, rounded up to the nearest multiple of 64.

Each bit corresponds to one TLB entry, ordered the same as in the shared TLB array.

The array is little-endian: the bit 0 is the least significant bit of the
first byte, bit 8 is the least significant bit of the second byte, etc.
This avoids any complications with differing word sizes.

The "num_dirty" field is a performance hint for KVM to determine whether it
should skip processing the bitmap and just invalidate everything.  It must
be set to the number of set bits in the bitmap.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux