Re: [PATCH v14 10/25] gunyah: rsc_mgr: Add RPC for sharing memory

Bjorn Andersson <andersson@xxxxxxxxxx> · Sat, 5 Aug 2023 10:55:14 -0700

On Tue, Jun 13, 2023 at 10:20:38AM -0700, Elliot Berman wrote:
> Gunyah resource manager provides API to manipulate stage 2 page tables.
> Manipulations are represented as a memory parcel. Memory parcels
> describe a list of memory regions (intermediate physical address--IPA--and
> size), a list of new permissions for VMs, and the memory type (DDR or
> MMIO). Each memory parcel is uniquely identified by a handle allocated by
> Gunyah. There are a few types of memory parcel sharing which Gunyah
> supports:
> 
>  - Sharing: the guest and host VM both have access
>  - Lending: only the guest has access; host VM loses access
>  - Donating: Permanently lent (not reclaimed even if guest shuts down)
> 
> Memory parcels that have been shared or lent can be reclaimed by the
> host via an additional call. The reclaim operation restores the original
> access the host VM had to the memory parcel and removes the access to
> other VM.
> 
> One point to note that memory parcels don't describe where in the guest
> VM the memory parcel should reside. The guest VM must accept the memory
> parcel either explicitly via a "gh_rm_mem_accept" call (not introduced
> here) or be configured to accept it automatically at boot. When the guest
> VM accepts the memory parcel, it also mentions the IPA it wants to place
> memory parcel. Although the region might be discontiguous on the host,
> the memory parcel is place contiguously in the guest memory at the
> specified IPA.
> 
> Reviewed-by: Alex Elder <elder@xxxxxxxxxx>
> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@xxxxxxxxxx>
> Co-developed-by: Prakruthi Deepak Heragu <quic_pheragu@xxxxxxxxxxx>
> Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@xxxxxxxxxxx>
> Signed-off-by: Elliot Berman <quic_eberman@xxxxxxxxxxx>
> ---
>  drivers/virt/gunyah/rsc_mgr_rpc.c | 227 ++++++++++++++++++++++++++++++
>  include/linux/gunyah_rsc_mgr.h    |  48 +++++++
>  2 files changed, 275 insertions(+)
> 
> diff --git a/drivers/virt/gunyah/rsc_mgr_rpc.c b/drivers/virt/gunyah/rsc_mgr_rpc.c
> index a4a9f0ba4e1fc..25064123a31c3 100644
> --- a/drivers/virt/gunyah/rsc_mgr_rpc.c
> +++ b/drivers/virt/gunyah/rsc_mgr_rpc.c
> @@ -6,6 +6,12 @@
>  #include <linux/gunyah_rsc_mgr.h>
>  #include "rsc_mgr.h"
>  
> +/* Message IDs: Memory Management */
> +#define GH_RM_RPC_MEM_LEND			0x51000012
> +#define GH_RM_RPC_MEM_SHARE			0x51000013
> +#define GH_RM_RPC_MEM_RECLAIM			0x51000015
> +#define GH_RM_RPC_MEM_APPEND			0x51000018
> +
>  /* Message IDs: VM Management */
>  #define GH_RM_RPC_VM_ALLOC_VMID			0x56000001
>  #define GH_RM_RPC_VM_DEALLOC_VMID		0x56000002
> @@ -22,6 +28,46 @@ struct gh_rm_vm_common_vmid_req {
>  	__le16 _padding;
>  } __packed;
>  
> +/* Call: MEM_LEND, MEM_SHARE */
> +#define GH_MEM_SHARE_REQ_FLAGS_APPEND		BIT(1)
> +
> +struct gh_rm_mem_share_req_header {
> +	u8 mem_type;
> +	u8 _padding0;
> +	u8 flags;
> +	u8 _padding1;
> +	__le32 label;
> +} __packed;
> +
> +struct gh_rm_mem_share_req_acl_section {
> +	__le32 n_entries;
> +	struct gh_rm_mem_acl_entry entries[];
> +};
> +
> +struct gh_rm_mem_share_req_mem_section {
> +	__le16 n_entries;
> +	__le16 _padding;
> +	struct gh_rm_mem_entry entries[];
> +};

Any reason why these two are not explicitly packed?

> +
> +/* Call: MEM_RELEASE */
> +struct gh_rm_mem_release_req {
> +	__le32 mem_handle;
> +	u8 flags; /* currently not used */
> +	u8 _padding0;
> +	__le16 _padding1;
> +} __packed;
> +
> +/* Call: MEM_APPEND */
> +#define GH_MEM_APPEND_REQ_FLAGS_END		BIT(0)
> +
> +struct gh_rm_mem_append_req_header {
> +	__le32 mem_handle;
> +	u8 flags;
> +	u8 _padding0;
> +	__le16 _padding1;
> +} __packed;
> +
>  /* Call: VM_ALLOC */
>  struct gh_rm_vm_alloc_vmid_resp {
>  	__le16 vmid;
> @@ -51,6 +97,8 @@ struct gh_rm_vm_config_image_req {
>  	__le64 dtb_size;
>  } __packed;
>  
> +#define GH_RM_MAX_MEM_ENTRIES	512
> +
>  /*
>   * Several RM calls take only a VMID as a parameter and give only standard
>   * response back. Deduplicate boilerplate code by using this common call.
> @@ -64,6 +112,185 @@ static int gh_rm_common_vmid_call(struct gh_rm *rm, u32 message_id, u16 vmid)
>  	return gh_rm_call(rm, message_id, &req_payload, sizeof(req_payload), NULL, NULL);
>  }
>  
> +static int _gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle, bool end_append,
> +			struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)
> +{
> +	struct gh_rm_mem_share_req_mem_section *mem_section;

"sections" is sufficient.

> +	struct gh_rm_mem_append_req_header *req_header;
> +	size_t msg_size = 0;
> +	void *msg;
> +	int ret;
> +
> +	msg_size += sizeof(struct gh_rm_mem_append_req_header);

msg_size is sizeof(header) + sizeof(sections), but written this way you
state that it's:

msg_size is X + sizeof(header),
and msg_size is also sizeof(sections)

It's the same thing, but you're forcing the reader to look for the
initial value of msg_size.

> +	msg_size += struct_size(mem_section, entries, n_mem_entries);
> +
> +	msg = kzalloc(msg_size, GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	req_header = msg;

Can't you just make msg a strcut gh_rm_mem_append_req_header?

> +	mem_section = (void *)(req_header + 1);
> +
> +	req_header->mem_handle = cpu_to_le32(mem_handle);
> +	if (end_append)
> +		req_header->flags |= GH_MEM_APPEND_REQ_FLAGS_END;
> +
> +	mem_section->n_entries = cpu_to_le16(n_mem_entries);
> +	memcpy(mem_section->entries, mem_entries, sizeof(*mem_entries) * n_mem_entries);
> +
> +	ret = gh_rm_call(rm, GH_RM_RPC_MEM_APPEND, msg, msg_size, NULL, NULL);
> +	kfree(msg);
> +
> +	return ret;
> +}
> +
> +static int gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle,
> +			struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)

You can name these arguments "entries" and "n_entries" (or num_entries)
without loosing any infromation, but saving a whole bunch of characters
that people have to read.

> +{
> +	bool end_append;

Iiuc this means "is this the last entry", perhaps calling it "last"
would make its purpose immediately obvious.

> +	int ret = 0;
> +	size_t n;
> +
> +	while (n_mem_entries) {
> +		if (n_mem_entries > GH_RM_MAX_MEM_ENTRIES) {
> +			end_append = false;
> +			n = GH_RM_MAX_MEM_ENTRIES;
> +		} else {
> +			end_append = true;
> +			n = n_mem_entries;
> +		}
> +
> +		ret = _gh_rm_mem_append(rm, mem_handle, end_append, mem_entries, n);

Every loop here you kzallc() and kfree() the same chunk of scratch
memory. If you inline _gh_rm_mem_append() here and do the allocation
once based on GH_RM_MAX_MEM_ENTRIES this would be faster and seemingly
easier to read/follow.

> +		if (ret)
> +			break;
> +
> +		mem_entries += n;
> +		n_mem_entries -= n;
> +	}
> +
> +	return ret;
> +}
> +
> +static int gh_rm_mem_lend_common(struct gh_rm *rm, u32 message_id, struct gh_rm_mem_parcel *p)
> +{
> +	size_t msg_size = 0, initial_mem_entries = p->n_mem_entries, resp_size;
> +	size_t acl_section_size, mem_section_size;

You keep all other variables one per line, and in reverse x-mas style.
Please do the same for this.

Naming these acl_size and mem_size would be beneficial.

> +	struct gh_rm_mem_share_req_acl_section *acl_section;
> +	struct gh_rm_mem_share_req_mem_section *mem_section;

Naming these acl and mem will make below easier to read.

> +	struct gh_rm_mem_share_req_header *req_header;
> +	u32 *attr_section;
> +	__le32 *resp;
> +	void *msg;
> +	int ret;
> +
> +	if (!p->acl_entries || !p->n_acl_entries || !p->mem_entries || !p->n_mem_entries ||
> +	    p->n_acl_entries > U8_MAX || p->mem_handle != GH_MEM_HANDLE_INVAL)
> +		return -EINVAL;
> +
> +	if (initial_mem_entries > GH_RM_MAX_MEM_ENTRIES)
> +		initial_mem_entries = GH_RM_MAX_MEM_ENTRIES;
> +
> +	acl_section_size = struct_size(acl_section, entries, p->n_acl_entries);
> +	mem_section_size = struct_size(mem_section, entries, initial_mem_entries);

An empty line here seems reasonable.

> +	/* The format of the message goes:
> +	 * request header
> +	 * ACL entries (which VMs get what kind of access to this memory parcel)
> +	 * Memory entries (list of memory regions to share)
> +	 * Memory attributes (currently unused, we'll hard-code the size to 0)
> +	 */
> +	msg_size += sizeof(struct gh_rm_mem_share_req_header);

Hidden above is the confirmation that msg_size was initialized to 0,
drop the '+' here to make it clear.

> +	msg_size += acl_section_size;
> +	msg_size += mem_section_size;
> +	msg_size += sizeof(u32); /* for memory attributes, currently unused */
> +
> +	msg = kzalloc(msg_size, GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	req_header = msg;

I don't think you need to keep req_header and msg separately.

> +	acl_section = (void *)req_header + sizeof(*req_header);
> +	mem_section = (void *)acl_section + acl_section_size;
> +	attr_section = (void *)mem_section + mem_section_size;
> +
> +	req_header->mem_type = p->mem_type;
> +	if (initial_mem_entries != p->n_mem_entries)

I think this will only happen if p->n_mem_entries >
GH_RM_MAX_MEM_ENTRIES, for which you already have a condition earlier.
Set the flag in that condition rather than spreading it over two (three)
code paths.

> +		req_header->flags |= GH_MEM_SHARE_REQ_FLAGS_APPEND;
> +	req_header->label = cpu_to_le32(p->label);
> +
> +	acl_section->n_entries = cpu_to_le32(p->n_acl_entries);
> +	memcpy(acl_section->entries, p->acl_entries,
> +		flex_array_size(acl_section, entries, p->n_acl_entries));
> +
> +	mem_section->n_entries = cpu_to_le16(initial_mem_entries);
> +	memcpy(mem_section->entries, p->mem_entries,
> +		flex_array_size(mem_section, entries, initial_mem_entries));
> +
> +	/* Set n_entries for memory attribute section to 0 */
> +	*attr_section = 0;
> +
> +	ret = gh_rm_call(rm, message_id, msg, msg_size, (void **)&resp, &resp_size);
> +	kfree(msg);
> +
> +	if (ret)
> +		return ret;
> +
> +	p->mem_handle = le32_to_cpu(*resp);
> +	kfree(resp);
> +
> +	if (initial_mem_entries != p->n_mem_entries) {

Another "we took that conditional path above", please make it obvious.

Can you please also write a comment describing that this operation is
split in a main one, followed by one of more appended operations as
necessary. This took me too long time to realize...

In particular I was wondering why "initial" didn't mean "the original
value", but now I see that it probably means "the value relevant for the
initial request".

> +		ret = gh_rm_mem_append(rm, p->mem_handle,
> +					&p->mem_entries[initial_mem_entries],
> +					p->n_mem_entries - initial_mem_entries);
> +		if (ret) {
> +			gh_rm_mem_reclaim(rm, p);
> +			p->mem_handle = GH_MEM_HANDLE_INVAL;

What happens to the entries that was already lended or shared?

Regards,
Bjorn