Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> The new PAPR nested guest API introduces the concept of a Guest State
> Buffer for communication about L2 guests between L1 and L0 hosts.
>
> In the new API, the L0 manages the L2 on behalf of the L1. This means
> that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
> table...), it must request the L0 perform the modification. If the
> nested host needs to read L2 state likewise this request must
> go through the L0.
>
> The Guest State Buffer is a Type-Length-Value style data format defined
> in the PAPR which assigns all relevant partition state a unique
> identity. Unlike a typical TLV format the length is redundant as the
> length of each identity is fixed but is included for checking
> correctness.
>
> A guest state buffer consists of an element count followed by a stream
> of elements, where elements are composed of an ID number, data length,
> then the data:
>
>   Header:
>
>    <---4 bytes--->
>   +----------------+-----
>   | Element Count  | Elements...
>   +----------------+-----
>
>   Element:
>
>    <----2 bytes---> <-2 bytes-> <-Length bytes->
>   +----------------+-----------+----------------+
>   | Guest State ID |  Length   |      Data      |
>   +----------------+-----------+----------------+
>
> Guest State IDs have other attributes defined in the PAPR such as
> whether they are per thread or per guest, or read-only.
>
> Introduce a library for using guest state buffers. This includes support
> for actions such as creating buffers, adding elements to buffers,
> reading the value of elements and parsing buffers. This will be used
> later by the PAPR nested guest support.

This is a tour de force in one of these things, so I hate to be
the "me smash with club" guy, but what if you allocated buffers
with enough room for all the state (or 99% of cases, in which
case an overflow would make an hcall)?

What's actually a fast-path that we don't get from the interrupt
return buffer? Getting and setting a few regs for MMIO emulation?


> Signed-off-by: Jordan Niethe <jpn@xxxxxxxxxxxxxxxxxx>
> ---
> v2:
>   - Add missing #ifdef CONFIG_VSXs
>   - Move files from lib/ to kvm/
>   - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
>   - Use kunit for guest state buffer tests
>   - Add configuration option for the tests
>   - Use macros for contiguous id ranges like GPRs
>   - Add some missing EXPORTs to functions
>   - HEIR element is a double word not a word
> ---
>  arch/powerpc/Kconfig.debug                    |  12 +
>  arch/powerpc/include/asm/guest-state-buffer.h | 901 ++++++++++++++++++
>  arch/powerpc/include/asm/kvm_book3s.h         |   2 +
>  arch/powerpc/kvm/Makefile                     |   3 +
>  arch/powerpc/kvm/guest-state-buffer.c         | 563 +++++++++++
>  arch/powerpc/kvm/test-guest-state-buffer.c    | 321 +++++++
>  6 files changed, 1802 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
>  create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
>  create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c
>
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 6aaf8dc60610..ed830a714720 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
>  	bool "Run self-tests of the MSI bitmap code"
>  	depends on DEBUG_KERNEL
>  
> +config GUEST_STATE_BUFFER_TEST
> +	def_tristate n
> +	prompt "Enable Guest State Buffer unit tests"
> +	depends on KUNIT
> +	depends on KVM_BOOK3S_HV_POSSIBLE
> +	default KUNIT_ALL_TESTS
> +	help
> +	  The Guest State Buffer is a data format specified in the PAPR.
> +	  It is by hcalls to communicate the state of L2 guests between
> +	  the L1 and L0 hypervisors. Enable unit tests for the library
> +	  used to create and use guest state buffers.
> +
>  config PPC_IRQ_SOFT_MASK_DEBUG
>  	bool "Include extra checks for powerpc irq soft masking"
>  	depends on PPC64
> diff --git a/arch/powerpc/include/asm/guest-state-buffer.h b/arch/powerpc/include/asm/guest-state-buffer.h
> new file mode 100644
> index 000000000000..65a840abf1bb
> --- /dev/null
> +++ b/arch/powerpc/include/asm/guest-state-buffer.h
> @@ -0,0 +1,901 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Interface based on include/net/netlink.h
> + */
> +#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +
> +#include <linux/gfp.h>
> +#include <linux/bitmap.h>
> +#include <asm/plpar_wrappers.h>
> +
> +/**************************************************************************
> + * Guest State Buffer Constants
> + **************************************************************************/
> +#define GSID_BLANK			0x0000

The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
you're calling the API that.

> +
> +#define GSID_HOST_STATE_SIZE		0x0001 /* Size of Hypervisor Internal Format VCPU state */
> +#define GSID_RUN_OUTPUT_MIN_SIZE	0x0002 /* Minimum size of the Run VCPU output buffer */
> +#define GSID_LOGICAL_PVR		0x0003 /* Logical PVR */
> +#define GSID_TB_OFFSET			0x0004 /* Timebase Offset */
> +#define GSID_PARTITION_TABLE		0x0005 /* Partition Scoped Page Table */
> +#define GSID_PROCESS_TABLE		0x0006 /* Process Table */

> +
> +#define GSID_RUN_INPUT			0x0C00 /* Run VCPU Input Buffer */
> +#define GSID_RUN_OUTPUT			0x0C01 /* Run VCPU Out Buffer */
> +#define GSID_VPA			0x0C02 /* HRA to Guest VCPU VPA */
> +
> +#define GSID_GPR(x)			(0x1000 + (x))
> +#define GSID_HDEC_EXPIRY_TB		0x1020
> +#define GSID_NIA			0x1021
> +#define GSID_MSR			0x1022
> +#define GSID_LR				0x1023
> +#define GSID_XER			0x1024
> +#define GSID_CTR			0x1025
> +#define GSID_CFAR			0x1026
> +#define GSID_SRR0			0x1027
> +#define GSID_SRR1			0x1028
> +#define GSID_DAR			0x1029

It's a shame you have to rip up all your wrapper functions now to
shoehorn these in.

If you included names analogous to the reg field names in the kvm
structures, the wrappers could do macro expansions that get them.

#define __GSID_WRAPPER_dar		GSID_DAR

Or similar.

And since of course you have to explicitly enumerate all these, I
wouldn't mind defining the types and lengths up-front rather than
down in the type function. You'd like to be able to go through the
spec and eyeball type, number, size.

[snip]

> +/**
> + * gsb_paddress() - the physical address of buffer
> + * @gsb: guest state buffer
> + *
> + * Returns the physical address of the buffer.
> + */
> +static inline u64 gsb_paddress(struct gs_buff *gsb)
> +{
> +	return __pa(gsb_header(gsb));
> +}

> +/**
> + * __gse_put_reg() - add a register type guest state element to a buffer
> + * @gsb: guest state buffer to add element to
> + * @iden: guest state ID
> + * @val: host endian value
> + *
> + * Adds a register type guest state element. Uses the guest state ID for
> + * determining the length of the guest element. If the guest state ID has
> + * bits that can not be set they will be cleared.
> + */
> +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> +{
> +	val &= gsid_mask(iden);
> +	if (gsid_size(iden) == sizeof(u64))
> +		return gse_put_u64(gsb, iden, val);
> +
> +	if (gsid_size(iden) == sizeof(u32)) {
> +		u32 tmp;
> +
> +		tmp = (u32)val;
> +		if (tmp != val)
> +			return -EINVAL;
> +
> +		return gse_put_u32(gsb, iden, tmp);
> +	}
> +	return -EINVAL;
> +}

There is a clever accessor that derives the length from the type, but
then you fall back to this.

> +/**
> + * gse_put - add a guest state element to a buffer
> + * @gsb: guest state buffer to add to
> + * @iden: guest state identity
> + * @v: generic value
> + */
> +#define gse_put(gsb, iden, v)					\
> +	(_Generic((v),						\
> +		  u64 : __gse_put_reg,				\
> +		  long unsigned int : __gse_put_reg,		\
> +		  u32 : __gse_put_reg,				\
> +		  struct gs_buff_info : gse_put_buff_info,	\
> +		  struct gs_proc_table : gse_put_proc_table,	\
> +		  struct gs_part_table : gse_put_part_table,	\
> +		  vector128 : gse_put_vector128)(gsb, iden, v))
> +
> +/**
> + * gse_get - return the data of a guest state element
> + * @gsb: guest state element to add to
> + * @v: generic value pointer to return in
> + */
> +#define gse_get(gse, v)						\
> +	(*v = (_Generic((v),					\
> +			u64 * : __gse_get_reg,			\
> +			unsigned long * : __gse_get_reg,	\
> +			u32 * : __gse_get_reg,			\
> +			vector128 * : gse_get_vector128)(gse)))

I don't see the benefit of this. Caller always knows the type doesn't
it? It seems like the right function could be called directly. It
makes the calling convention a bit clunky too. I know there's similar
precedent for uaccess functions, but not sure I like it for this.

> +struct gs_buff *gsb_new(size_t size, unsigned long guest_id,
> +			unsigned long vcpu_id, gfp_t flags)
> +{
> +	struct gs_buff *gsb;
> +
> +	gsb = kzalloc(sizeof(*gsb), flags);
> +	if (!gsb)
> +		return NULL;
> +
> +	size = roundup_pow_of_two(size);
> +	gsb->hdr = kzalloc(size, GFP_KERNEL);
> +	if (!gsb->hdr)
> +		goto free;
> +
> +	gsb->capacity = size;
> +	gsb->len = sizeof(struct gs_header);
> +	gsb->vcpu_id = vcpu_id;
> +	gsb->guest_id = guest_id;
> +
> +	gsb->hdr->nelems = cpu_to_be32(0);
> +
> +	return gsb;
> +
> +free:
> +	kfree(gsb);
> +	return NULL;
> +}
> +EXPORT_SYMBOL(gsb_new);

Should all be GPL exports.

Needs more namespace too, I reckon (not just exports but any kernel-wide
name this short and non-descriptive needs a kvmppc or kvm_papr or
something).

Thanks,
Nick




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux