Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/16/2012 06:01 AM, Paul Mackerras wrote:
> A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
> this fd return the contents of the HPT (hashed page table), writes
> create and/or remove entries in the HPT.  There is a new capability,
> KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
> takes an argument structure with the index of the first HPT entry to
> read out and a set of flags.  The flags indicate whether the user is
> intending to read or write the HPT, and whether to return all entries
> or only the "bolted" entries (those with the bolted bit, 0x10, set in
> the first doubleword).
> 
> This is intended for use in implementing qemu's savevm/loadvm and for
> live migration.  Therefore, on reads, the first pass returns information
> about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
> end of the HPT, it returns from the read.  Subsequent reads only return
> information about HPTEs that have changed since they were last read.
> A read that finds no changed HPTEs in the HPT following where the last
> read finished will return 0 bytes.

Copying people with interest in migration.

> +4.78 KVM_PPC_GET_HTAB_FD
> +
> +Capability: KVM_CAP_PPC_HTAB_FD
> +Architectures: powerpc
> +Type: vm ioctl
> +Parameters: Pointer to struct kvm_get_htab_fd (in)
> +Returns: file descriptor number (>= 0) on success, -1 on error
> +
> +This returns a file descriptor that can be used either to read out the
> +entries in the guest's hashed page table (HPT), or to write entries to
> +initialize the HPT.  The returned fd can only be written to if the
> +KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
> +can only be read if that bit is clear.  The argument struct looks like
> +this:
> +
> +/* For KVM_PPC_GET_HTAB_FD */
> +struct kvm_get_htab_fd {
> +	__u64	flags;
> +	__u64	start_index;
> +};
> +
> +/* Values for kvm_get_htab_fd.flags */
> +#define KVM_GET_HTAB_BOLTED_ONLY	((__u64)0x1)
> +#define KVM_GET_HTAB_WRITE		((__u64)0x2)
> +
> +The `start_index' field gives the index in the HPT of the entry at
> +which to start reading.  It is ignored when writing.
> +
> +Reads on the fd will initially supply information about all
> +"interesting" HPT entries.  Interesting entries are those with the
> +bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
> +all entries.  When the end of the HPT is reached, the read() will
> +return.  

What happens if the read buffer is smaller than the HPT size?

What happens if the read buffer size is not a multiple of entry size?

Does/should the fd support O_NONBLOCK and poll? (=waiting for an entry
to change).

> If read() is called again on the fd, it will start again from
> +the beginning of the HPT, but will only return HPT entries that have
> +changed since they were last read.
> +
> +Data read or written is structured as a header (8 bytes) followed by a
> +series of valid HPT entries (16 bytes) each.  The header indicates how
> +many valid HPT entries there are and how many invalid entries follow
> +the valid entries.  The invalid entries are not represented explicitly
> +in the stream.  The header format is:
> +
> +struct kvm_get_htab_header {
> +	__u32	index;
> +	__u16	n_valid;
> +	__u16	n_invalid;
> +};

This structure forces the kernel to return entries sequentially.  Will
this block changing the data structure in the future?  Or is the
hardware spec sufficiently strict that such changes are not realistic?

> +
> +Writes to the fd create HPT entries starting at the index given in the
> +header; first `n_valid' valid entries with contents from the data
> +written, then `n_invalid' invalid entries, invalidating any previously
> +valid entries found.

This scheme is a clever, original, and very interesting approach to live
migration.  That doesn't necessarily mean a NAK, we should see if it
makes sense for other migration APIs as well (we currently have
difficulties migrating very large/wide guests).

What is the typical number of entries in the HPT?  Do you have estimates
of the change rate?

Suppose new hardware arrives that supports nesting HPTs, so that kvm is
no longer synchronously aware of the guest HPT (similar to how NPT/EPT
made kvm unaware of guest virtual->physical translations on x86).  How
will we deal with that?  But I guess this will be a
non-guest-transparent and non-userspace-transparent change, unlike
NPT/EPT, so a userspace ABI addition will be needed anyway).


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux