Hi Suzuki, Sorry for the very slow response to this. Coming back to this I'm having doubts, see below. On 17/10/2024 14:00, Suzuki K Poulose wrote: > On 04/10/2024 16:27, Steven Price wrote: >> Entering a realm is done using a SMC call to the RMM. On exit the >> exit-codes need to be handled slightly differently to the normal KVM >> path so define our own functions for realm enter/exit and hook them >> in if the guest is a realm guest. >> >> Signed-off-by: Steven Price <steven.price@xxxxxxx> ... >> diff --git a/arch/arm64/kvm/rme-exit.c b/arch/arm64/kvm/rme-exit.c >> new file mode 100644 >> index 000000000000..e96ea308212c >> --- /dev/null >> +++ b/arch/arm64/kvm/rme-exit.c ... >> +static int rec_exit_ripas_change(struct kvm_vcpu *vcpu) >> +{ >> + struct kvm *kvm = vcpu->kvm; >> + struct realm *realm = &kvm->arch.realm; >> + struct realm_rec *rec = &vcpu->arch.rec; >> + unsigned long base = rec->run->exit.ripas_base; >> + unsigned long top = rec->run->exit.ripas_top; >> + unsigned long ripas = rec->run->exit.ripas_value; >> + unsigned long top_ipa; >> + int ret; >> + >> + if (!realm_is_addr_protected(realm, base) || >> + !realm_is_addr_protected(realm, top - 1)) { >> + kvm_err("Invalid RIPAS_CHANGE for %#lx - %#lx, ripas: %#lx\n", >> + base, top, ripas); >> + return -EINVAL; >> + } >> + >> + kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache, >> + kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu)); > > I think we also need to filter the request for RIPAS_RAM, by consulting > if the "range" is backed by a memslot or not. If they are not, we should > reject the request with a response flag set in run.enter.flags. It's an interesting API question. At the moment there is no requirement to have an active memslot to set the RIPAS - this is true both during the setup by the VMM and at run time. In theory a VMM can create/destroy memslots while the guest is running. So absense of a memslot doesn't actually imply that the RIPAS change should be rejected. Obviously with realms this is tricky because when destroying a memslot that's in use KVM would rip those pages out from the guest and it would require guest cooperation to restore those pages (transition to RIPAS_EMPTY and back to RIPAS_RAM). But it's not something that has been prohibited so far. On the other hand this is a clear way for a (malicious/buggy) guest to use a fair bit of RAM by transitioning to RIPAS_RAM (sparse) pages not in a memslot and forcing KVM to allocate the RTT pages to delegate to the RMM. But we do exit to the VMM, so this is solvable in the VMM (by killing a misbehaving guest). The number of pages this would consume per exit is also fairly small. So my instinct is that we shouldn't impose that requirement. Any thoughts? > As for EMPTY requests, if the guest wants to explicitly mark any range > as EMPTY, it doesn't matter, as long as it is within the protected IPA. > (even though they may be EMPTY in the first place). > >> + write_lock(&kvm->mmu_lock); >> + ret = realm_set_ipa_state(vcpu, base, top, ripas, &top_ipa); >> + write_unlock(&kvm->mmu_lock); >> + >> + WARN(ret && ret != -ENOMEM, >> + "Unable to satisfy RIPAS_CHANGE for %#lx - %#lx, ripas: >> %#lx\n", >> + base, top, ripas); >> + >> + /* Exit to VMM to complete the change */ >> + kvm_prepare_memory_fault_exit(vcpu, base, top_ipa - base, false, >> false, >> + ripas == RMI_RAM); > > Again this may only be need if the range is backed by a memslot ? > Otherwise the VMM has nothing to do. Assuming the above, then the VMM would be the one to kill a misbehaving guest, so would need a notification. Thanks, Steve >> + >> + return 0; >> +} >> + >> +static void update_arch_timer_irq_lines(struct kvm_vcpu *vcpu) >> +{ >> + struct realm_rec *rec = &vcpu->arch.rec; >> + >> + __vcpu_sys_reg(vcpu, CNTV_CTL_EL0) = rec->run->exit.cntv_ctl; >> + __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0) = rec->run->exit.cntv_cval; >> + __vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = rec->run->exit.cntp_ctl; >> + __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = rec->run->exit.cntp_cval; >> + >> + kvm_realm_timers_update(vcpu); >> +} >> + >> +/* >> + * Return > 0 to return to guest, < 0 on error, 0 (and set >> exit_reason) on >> + * proper exit to userspace. >> + */ >> +int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_ret) >> +{ >> + struct realm_rec *rec = &vcpu->arch.rec; >> + u8 esr_ec = ESR_ELx_EC(rec->run->exit.esr); >> + unsigned long status, index; >> + >> + status = RMI_RETURN_STATUS(rec_run_ret); >> + index = RMI_RETURN_INDEX(rec_run_ret); >> + >> + /* >> + * If a PSCI_SYSTEM_OFF request raced with a vcpu executing, we >> might >> + * see the following status code and index indicating an attempt >> to run >> + * a REC when the RD state is SYSTEM_OFF. In this case, we just >> need to >> + * return to user space which can deal with the system event or >> will try >> + * to run the KVM VCPU again, at which point we will no longer >> attempt >> + * to enter the Realm because we will have a sleep request >> pending on >> + * the VCPU as a result of KVM's PSCI handling. >> + */ >> + if (status == RMI_ERROR_REALM && index == 1) { >> + vcpu->run->exit_reason = KVM_EXIT_UNKNOWN; >> + return 0; >> + } >> + >> + if (rec_run_ret) >> + return -ENXIO; >> + >> + vcpu->arch.fault.esr_el2 = rec->run->exit.esr; >> + vcpu->arch.fault.far_el2 = rec->run->exit.far; >> + vcpu->arch.fault.hpfar_el2 = rec->run->exit.hpfar; >> + >> + update_arch_timer_irq_lines(vcpu); >> + >> + /* Reset the emulation flags for the next run of the REC */ >> + rec->run->enter.flags = 0; >> + >> + switch (rec->run->exit.exit_reason) { >> + case RMI_EXIT_SYNC: >> + return rec_exit_handlers[esr_ec](vcpu); >> + case RMI_EXIT_IRQ: >> + case RMI_EXIT_FIQ: >> + return 1; >> + case RMI_EXIT_PSCI: >> + return rec_exit_psci(vcpu); >> + case RMI_EXIT_RIPAS_CHANGE: >> + return rec_exit_ripas_change(vcpu); >> + } >> + >> + kvm_pr_unimpl("Unsupported exit reason: %u\n", >> + rec->run->exit.exit_reason); >> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; >> + return 0; >> +} >> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c >> index 1fa9991d708b..4c0751231810 100644 >> --- a/arch/arm64/kvm/rme.c >> +++ b/arch/arm64/kvm/rme.c >> @@ -899,6 +899,25 @@ void kvm_destroy_realm(struct kvm *kvm) >> kvm_free_stage2_pgd(&kvm->arch.mmu); >> } >> +int kvm_rec_enter(struct kvm_vcpu *vcpu) >> +{ >> + struct realm_rec *rec = &vcpu->arch.rec; >> + >> + switch (rec->run->exit.exit_reason) { >> + case RMI_EXIT_HOST_CALL: >> + case RMI_EXIT_PSCI: >> + for (int i = 0; i < REC_RUN_GPRS; i++) >> + rec->run->enter.gprs[i] = vcpu_get_reg(vcpu, i); >> + break; >> + } > > As mentioned in the patch following (MMIO emulation support), we may be > able to do this unconditionally for all REC entries, to cover ourselves > from missing out other cases. The RMM is in charge of taking the > appropriate action anyways to copy the results back. > > Suzuki > >> + >> + if (kvm_realm_state(vcpu->kvm) != REALM_STATE_ACTIVE) >> + return -EINVAL; >> + >> + return rmi_rec_enter(virt_to_phys(rec->rec_page), >> + virt_to_phys(rec->run)); >> +} >> + >> static void free_rec_aux(struct page **aux_pages, >> unsigned int num_aux) >> {