Re: [PATCH 4/4] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14.11.2012, at 05:33, Paul Mackerras wrote:

> A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
> this fd return the contents of the HPT (hashed page table), writes
> create and/or remove entries in the HPT.  There is a new capability,
> KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
> takes an argument structure with the index of the first HPT entry to
> read out and a set of flags.  The flags indicate whether the user is
> intending to read or write the HPT, and whether to return all entries
> or only the "bolted" entries (those with the bolted bit, 0x10, set in
> the first doubleword).
> 
> This is intended for use in implementing qemu's savevm/loadvm and for
> live migration.  Therefore, on reads, the first pass returns information
> about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
> end of the HPT, it returns from the read.  Subsequent reads only return
> information about HPTEs that have changed since they were last read.
> A read that finds no changed HPTEs in the HPT following where the last
> read finished will return 0 bytes.
> 
> The format of the data provides a simple run-length compression of the
> invalid entries.  Each block of data starts with a header that indicates
> the index (position in the HPT, which is just an array), the number of
> valid entries starting at that index (may be zero), and the number of
> invalid entries following those valid entries.  The valid entries, 16
> bytes each, follow the header.  The invalid entries are not explicitly
> represented.
> 
> Signed-off-by: Paul Mackerras <paulus@xxxxxxxxx>
> ---
> Documentation/virtual/kvm/api.txt        |   53 +++++
> arch/powerpc/include/asm/kvm_book3s_64.h |   18 ++
> arch/powerpc/include/asm/kvm_ppc.h       |    2 +
> arch/powerpc/include/uapi/asm/kvm.h      |   24 +++
> arch/powerpc/kvm/book3s_64_mmu_hv.c      |  344 ++++++++++++++++++++++++++++++
> arch/powerpc/kvm/book3s_hv.c             |   12 --
> arch/powerpc/kvm/powerpc.c               |   17 ++
> include/uapi/linux/kvm.h                 |    3 +
> 8 files changed, 461 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 6671fdc..33080ea 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2071,6 +2071,59 @@ KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
> 
> Note that the vcpu ioctl is asynchronous to vcpu execution.
> 
> +4.78 KVM_PPC_GET_HTAB_FD
> +
> +Capability: KVM_CAP_PPC_HTAB_FD
> +Architectures: powerpc
> +Type: vm ioctl
> +Parameters: Pointer to struct kvm_get_htab_fd (in)
> +Returns: file descriptor number (>= 0) on success, -1 on error
> +
> +This returns a file descriptor that can be used either to read out the
> +entries in the guest's hashed page table (HPT), or to write entries to
> +initialize the HPT.  The returned fd can only be written to if the
> +KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
> +can only be read if that bit is clear.  The argument struct looks like
> +this:
> +
> +/* For KVM_PPC_GET_HTAB_FD */
> +struct kvm_get_htab_fd {
> +	__u64	flags;
> +	__u64	start_index;
> +};
> +
> +/* Values for kvm_get_htab_fd.flags */
> +#define KVM_GET_HTAB_BOLTED_ONLY	((__u64)0x1)
> +#define KVM_GET_HTAB_WRITE		((__u64)0x2)
> +
> +The `start_index' field gives the index in the HPT of the entry at
> +which to start reading.  It is ignored when writing.
> +
> +Reads on the fd will initially supply information about all
> +"interesting" HPT entries.  Interesting entries are those with the
> +bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
> +all entries.  When the end of the HPT is reached, the read() will
> +return.  If read() is called again on the fd, it will start again from
> +the beginning of the HPT, but will only return HPT entries that have
> +changed since they were last read.
> +
> +Data read or written is structured as a header (8 bytes) followed by a
> +series of valid HPT entries (16 bytes) each.  The header indicates how
> +many valid HPT entries there are and how many invalid entries follow
> +the valid entries.  The invalid entries are not represented explicitly
> +in the stream.  The header format is:
> +
> +struct kvm_get_htab_header {
> +	__u32	index;
> +	__u16	n_valid;
> +	__u16	n_invalid;
> +};
> +
> +Writes to the fd create HPT entries starting at the index given in the
> +header; first `n_valid' valid entries with contents from the data
> +written, then `n_invalid' invalid entries, invalidating any previously
> +valid entries found.
> +
> 
> 5. The kvm_run structure
> ------------------------
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 4ca4f25..dc0a78d 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -243,4 +243,22 @@ static inline bool slot_is_aligned(struct kvm_memory_slot *memslot,
> 	return !(memslot->base_gfn & mask) && !(memslot->npages & mask);
> }
> 
> +static inline unsigned long slb_pgsize_encoding(unsigned long psize)
> +{
> +	unsigned long senc = 0;
> +
> +	if (psize > 0x1000) {
> +		senc = SLB_VSID_L;
> +		if (psize == 0x10000)
> +			senc |= SLB_VSID_LP_01;

Is this always accurate?

> +	}
> +	return senc;
> +}
> +
> +static inline int is_vrma_hpte(unsigned long hpte_v)
> +{
> +	return (hpte_v & ~0xffffffUL) ==
> +		(HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)));
> +}
> +
> #endif /* __ASM_KVM_BOOK3S_64_H__ */
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 609cca3..1ca31e9 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -164,6 +164,8 @@ extern void kvmppc_bookehv_exit(void);
> 
> extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu);
> 
> +extern int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *);
> +
> /*
>  * Cuts out inst bits with ordering according to spec.
>  * That means the leftmost bit is zero. All given bits are included.
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
> index b89ae4d..6518e38 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -331,6 +331,30 @@ struct kvm_book3e_206_tlb_params {
> 	__u32 reserved[8];
> };
> 
> +/* For KVM_PPC_GET_HTAB_FD */
> +struct kvm_get_htab_fd {
> +	__u64	flags;
> +	__u64	start_index;

Please add some padding here, in case we need more information passed down. 16 bytes should be enough.

The actual protocol looks quite good :). And if it doesn't work for us anymore, we can always bump it to a new protocol using the flags in the above ioctl.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux