Re: [PATCH v5 18/43] arm64: RME: Handle realm enter/exit

Steven Price <steven.price@xxxxxxx> · Fri, 29 Nov 2024 12:18:36 +0000

Hi Suzuki,

Sorry for the very slow response to this. Coming back to this I'm having
doubts, see below.

On 17/10/2024 14:00, Suzuki K Poulose wrote:
> On 04/10/2024 16:27, Steven Price wrote:
>> Entering a realm is done using a SMC call to the RMM. On exit the
>> exit-codes need to be handled slightly differently to the normal KVM
>> path so define our own functions for realm enter/exit and hook them
>> in if the guest is a realm guest.
>>
>> Signed-off-by: Steven Price <steven.price@xxxxxxx>
...
>> diff --git a/arch/arm64/kvm/rme-exit.c b/arch/arm64/kvm/rme-exit.c
>> new file mode 100644
>> index 000000000000..e96ea308212c
>> --- /dev/null
>> +++ b/arch/arm64/kvm/rme-exit.c
...
>> +static int rec_exit_ripas_change(struct kvm_vcpu *vcpu)
>> +{
>> +    struct kvm *kvm = vcpu->kvm;
>> +    struct realm *realm = &kvm->arch.realm;
>> +    struct realm_rec *rec = &vcpu->arch.rec;
>> +    unsigned long base = rec->run->exit.ripas_base;
>> +    unsigned long top = rec->run->exit.ripas_top;
>> +    unsigned long ripas = rec->run->exit.ripas_value;
>> +    unsigned long top_ipa;
>> +    int ret;
>> +
>> +    if (!realm_is_addr_protected(realm, base) ||
>> +        !realm_is_addr_protected(realm, top - 1)) {
>> +        kvm_err("Invalid RIPAS_CHANGE for %#lx - %#lx, ripas: %#lx\n",
>> +            base, top, ripas);
>> +        return -EINVAL;
>> +    }
>> +
>> +    kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
>> +                   kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
> 
> I think we also need to filter the request for RIPAS_RAM, by consulting
> if the "range" is backed by a memslot or not. If they are not, we should
> reject the request with a response flag set in run.enter.flags.

It's an interesting API question. At the moment there is no requirement
to have an active memslot to set the RIPAS - this is true both during
the setup by the VMM and at run time.

In theory a VMM can create/destroy memslots while the guest is running.
So absense of a memslot doesn't actually imply that the RIPAS change
should be rejected. Obviously with realms this is tricky because when
destroying a memslot that's in use KVM would rip those pages out from
the guest and it would require guest cooperation to restore those pages
(transition to RIPAS_EMPTY and back to RIPAS_RAM). But it's not
something that has been prohibited so far.

On the other hand this is a clear way for a (malicious/buggy) guest to
use a fair bit of RAM by transitioning to RIPAS_RAM (sparse) pages not
in a memslot and forcing KVM to allocate the RTT pages to delegate to
the RMM. But we do exit to the VMM, so this is solvable in the VMM (by
killing a misbehaving guest). The number of pages this would consume per
exit is also fairly small.

So my instinct is that we shouldn't impose that requirement.

Any thoughts?

> As for EMPTY requests, if the guest wants to explicitly mark any range
> as EMPTY, it doesn't matter, as long as it is within the protected IPA.
> (even though they may be EMPTY in the first place).
> 
>> +    write_lock(&kvm->mmu_lock);
>> +    ret = realm_set_ipa_state(vcpu, base, top, ripas, &top_ipa);
>> +    write_unlock(&kvm->mmu_lock);
>> +
>> +    WARN(ret && ret != -ENOMEM,
>> +         "Unable to satisfy RIPAS_CHANGE for %#lx - %#lx, ripas:
>> %#lx\n",
>> +         base, top, ripas);
>> +
>> +    /* Exit to VMM to complete the change */
>> +    kvm_prepare_memory_fault_exit(vcpu, base, top_ipa - base, false,
>> false,
>> +                      ripas == RMI_RAM);
> 
> Again this may only be need if the range is backed by a memslot ?
> Otherwise the VMM has nothing to do.

Assuming the above, then the VMM would be the one to kill a misbehaving
guest, so would need a notification.

Thanks,
Steve

>> +
>> +    return 0;
>> +}
>> +
>> +static void update_arch_timer_irq_lines(struct kvm_vcpu *vcpu)
>> +{
>> +    struct realm_rec *rec = &vcpu->arch.rec;
>> +
>> +    __vcpu_sys_reg(vcpu, CNTV_CTL_EL0) = rec->run->exit.cntv_ctl;
>> +    __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0) = rec->run->exit.cntv_cval;
>> +    __vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = rec->run->exit.cntp_ctl;
>> +    __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = rec->run->exit.cntp_cval;
>> +
>> +    kvm_realm_timers_update(vcpu);
>> +}
>> +
>> +/*
>> + * Return > 0 to return to guest, < 0 on error, 0 (and set
>> exit_reason) on
>> + * proper exit to userspace.
>> + */
>> +int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_ret)
>> +{
>> +    struct realm_rec *rec = &vcpu->arch.rec;
>> +    u8 esr_ec = ESR_ELx_EC(rec->run->exit.esr);
>> +    unsigned long status, index;
>> +
>> +    status = RMI_RETURN_STATUS(rec_run_ret);
>> +    index = RMI_RETURN_INDEX(rec_run_ret);
>> +
>> +    /*
>> +     * If a PSCI_SYSTEM_OFF request raced with a vcpu executing, we
>> might
>> +     * see the following status code and index indicating an attempt
>> to run
>> +     * a REC when the RD state is SYSTEM_OFF.  In this case, we just
>> need to
>> +     * return to user space which can deal with the system event or
>> will try
>> +     * to run the KVM VCPU again, at which point we will no longer
>> attempt
>> +     * to enter the Realm because we will have a sleep request
>> pending on
>> +     * the VCPU as a result of KVM's PSCI handling.
>> +     */
>> +    if (status == RMI_ERROR_REALM && index == 1) {
>> +        vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
>> +        return 0;
>> +    }
>> +
>> +    if (rec_run_ret)
>> +        return -ENXIO;
>> +
>> +    vcpu->arch.fault.esr_el2 = rec->run->exit.esr;
>> +    vcpu->arch.fault.far_el2 = rec->run->exit.far;
>> +    vcpu->arch.fault.hpfar_el2 = rec->run->exit.hpfar;
>> +
>> +    update_arch_timer_irq_lines(vcpu);
>> +
>> +    /* Reset the emulation flags for the next run of the REC */
>> +    rec->run->enter.flags = 0;
>> +
>> +    switch (rec->run->exit.exit_reason) {
>> +    case RMI_EXIT_SYNC:
>> +        return rec_exit_handlers[esr_ec](vcpu);
>> +    case RMI_EXIT_IRQ:
>> +    case RMI_EXIT_FIQ:
>> +        return 1;
>> +    case RMI_EXIT_PSCI:
>> +        return rec_exit_psci(vcpu);
>> +    case RMI_EXIT_RIPAS_CHANGE:
>> +        return rec_exit_ripas_change(vcpu);
>> +    }
>> +
>> +    kvm_pr_unimpl("Unsupported exit reason: %u\n",
>> +              rec->run->exit.exit_reason);
>> +    vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>> +    return 0;
>> +}
>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>> index 1fa9991d708b..4c0751231810 100644
>> --- a/arch/arm64/kvm/rme.c
>> +++ b/arch/arm64/kvm/rme.c
>> @@ -899,6 +899,25 @@ void kvm_destroy_realm(struct kvm *kvm)
>>       kvm_free_stage2_pgd(&kvm->arch.mmu);
>>   }
>>   +int kvm_rec_enter(struct kvm_vcpu *vcpu)
>> +{
>> +    struct realm_rec *rec = &vcpu->arch.rec;
>> +
>> +    switch (rec->run->exit.exit_reason) {
>> +    case RMI_EXIT_HOST_CALL:
>> +    case RMI_EXIT_PSCI:
>> +        for (int i = 0; i < REC_RUN_GPRS; i++)
>> +            rec->run->enter.gprs[i] = vcpu_get_reg(vcpu, i);
>> +        break;
>> +    }
> 
> As mentioned in the patch following (MMIO emulation support), we may be
> able to do this unconditionally for all REC entries, to cover ourselves
> from missing out other cases. The RMM is in charge of taking the
> appropriate action anyways to copy the results back.
> 
> Suzuki
> 
>> +
>> +    if (kvm_realm_state(vcpu->kvm) != REALM_STATE_ACTIVE)
>> +        return -EINVAL;
>> +
>> +    return rmi_rec_enter(virt_to_phys(rec->rec_page),
>> +                 virt_to_phys(rec->run));
>> +}
>> +
>>   static void free_rec_aux(struct page **aux_pages,
>>                unsigned int num_aux)
>>   {