Re: [PATCH v7 08/45] arm64: RME: ioctls to create and configure realms

Gavin Shan <gshan@xxxxxxxxxx> · Mon, 3 Mar 2025 14:42:04 +1000




On 2/14/25 2:13 AM, Steven Price wrote:
Add the KVM_CAP_ARM_RME_CREATE_RD ioctl to create a realm. This involves
delegating pages to the RMM to hold the Realm Descriptor (RD) and for
the base level of the Realm Translation Tables (RTT). A VMID also need
to be picked, since the RMM has a separate VMID address space a
dedicated allocator is added for this purpose.

KVM_CAP_ARM_RME_CONFIG_REALM is provided to allow configuring the realm
before it is created. Configuration options can be classified as:

  1. Parameters specific to the Realm stage2 (e.g. IPA Size, vmid, stage2
     entry level, entry level RTTs, number of RTTs in start level, LPA2)
     Most of these are not measured by RMM and comes from KVM book
     keeping.

  2. Parameters controlling "Arm Architecture features for the VM". (e.g.
     SVE VL, PMU counters, number of HW BRPs/WPs), configured by the VMM
     using the "user ID register write" mechanism. These will be
     supported in the later patches.

  3. Parameters are not part of the core Arm architecture but defined
     by the RMM spec (e.g. Hash algorithm for measurement,
     Personalisation value). These are programmed via
     KVM_CAP_ARM_RME_CONFIG_REALM.

For the IPA size there is the possibility that the RMM supports a
different size to the IPA size supported by KVM for normal guests. At
the moment the 'normal limit' is exposed by KVM_CAP_ARM_VM_IPA_SIZE and
the IPA size is configured by the bottom bits of vm_type in
KVM_CREATE_VM. This means that it isn't easy for the VMM to discover
what IPA sizes are supported for Realm guests. Since the IPA is part of
the measurement of the realm guest the current expectation is that the
VMM will be required to pick the IPA size demanded by attestation and
therefore simply failing if this isn't available is fine. An option
would be to expose a new capability ioctl to obtain the RMM's maximum
IPA size if this is needed in the future.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
Signed-off-by: Steven Price <steven.price@xxxxxxx>
---
Changes since v6:
  * Separate RMM RTT calculations from host PAGE_SIZE. This allows the
    host page size to be larger than 4k while still communicating with an
    RMM which uses 4k granules.
Changes since v5:
  * Introduce free_delegated_granule() to replace many
    undelegate/free_page() instances and centralise the comment on
    leaking when the undelegate fails.
  * Several other minor improvements suggested by reviews - thanks for
    the feedback!
Changes since v2:
  * Improved commit description.
  * Improved return failures for rmi_check_version().
  * Clear contents of PGD after it has been undelegated in case the RMM
    left stale data.
  * Minor changes to reflect changes in previous patches.
---
  arch/arm64/include/asm/kvm_emulate.h |   5 +
  arch/arm64/include/asm/kvm_rme.h     |  19 ++
  arch/arm64/kvm/arm.c                 |  16 ++
  arch/arm64/kvm/mmu.c                 |  22 +-
  arch/arm64/kvm/rme.c                 | 322 +++++++++++++++++++++++++++
  5 files changed, 382 insertions(+), 2 deletions(-)


With below comments addressed:

Reviewed-by: Gavin Shan <gshan@xxxxxxxxxx>

[...]

+
+static int realm_create_rd(struct kvm *kvm)
+{
+	struct realm *realm = &kvm->arch.realm;
+	struct realm_params *params = realm->params;
+	void *rd = NULL;
+	phys_addr_t rd_phys, params_phys;
+	size_t pgd_size = kvm_pgtable_stage2_pgd_size(kvm->arch.mmu.vtcr);
+	int i, r;
+	int rtt_num_start;
+
+	realm->ia_bits = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
+	rtt_num_start = realm_num_root_rtts(realm);
+
+	if (WARN_ON(realm->rd) || WARN_ON(!realm->params))
+		return -EEXIST;
+

Two WARN_ON() can be combined into one.

	if (WARN_ON(realm->rd || !realm->param))

+	if (pgd_size / RMM_PAGE_SIZE < rtt_num_start)
+		return -EINVAL;
+
+	rd = (void *)__get_free_page(GFP_KERNEL);
+	if (!rd)
+		return -ENOMEM;
+
+	rd_phys = virt_to_phys(rd);
+	if (rmi_granule_delegate(rd_phys)) {
+		r = -ENXIO;
+		goto free_rd;
+	}
+
+	for (i = 0; i < pgd_size; i += RMM_PAGE_SIZE) {
+		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i;
+
+		if (rmi_granule_delegate(pgd_phys)) {
+			r = -ENXIO;
+			goto out_undelegate_tables;
+		}
+	}
+
+	params->s2sz = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
+	params->rtt_level_start = get_start_level(realm);
+	params->rtt_num_start = rtt_num_start;
+	params->rtt_base = kvm->arch.mmu.pgd_phys;
+	params->vmid = realm->vmid;
+
+	params_phys = virt_to_phys(params);
+
+	if (rmi_realm_create(rd_phys, params_phys)) {
+		r = -ENXIO;
+		goto out_undelegate_tables;
+	}
+
+	if (WARN_ON(rmi_rec_aux_count(rd_phys, &realm->num_aux))) {
+		WARN_ON(rmi_realm_destroy(rd_phys));
+		goto out_undelegate_tables;
+	}
+
+	realm->rd = rd;
+
+	return 0;
+
+out_undelegate_tables:
+	while (i > 0) {
+		i -= RMM_PAGE_SIZE;
+
+		phys_addr_t pgd_phys = kvm->arch.mmu.pgd_phys + i;
+
+		if (WARN_ON(rmi_granule_undelegate(pgd_phys))) {
+			/* Leak the pages if they cannot be returned */
+			kvm->arch.mmu.pgt = NULL;
+			break;
+		}
+	}
+	if (WARN_ON(rmi_granule_undelegate(rd_phys))) {
+		/* Leak the page if it isn't returned */
+		return r;
+	}
+free_rd:
+	free_page((unsigned long)rd);
+	return r;
+}
+

[...]

+
+int kvm_init_realm_vm(struct kvm *kvm)
+{
+	struct realm_params *params;
+
+	params = (struct realm_params *)get_zeroed_page(GFP_KERNEL);
+	if (!params)
+		return -ENOMEM;
+
+	kvm->arch.realm.params = params;
+	return 0;
+}
+

The local variable @params is unnecessary, something like below.

	kvm->arch.realm.params = (struct realm_parms *)get_zeroed_page(GFP_KERNEL);

  void kvm_init_rme(void)
  {
  	if (PAGE_SIZE != SZ_4K)
@@ -52,5 +368,11 @@ void kvm_init_rme(void)
  		/* Continue without realm support */
  		return;
  
+	if (WARN_ON(rmi_features(0, &rmm_feat_reg0)))
+		return;
+
+	if (rme_vmid_init())
+		return;
+
  	/* Future patch will enable static branch kvm_rme_is_available */
  }

Thanks,
Gavin