Re: [RFC] Add virtual SDEI support in qemu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys,

On 15/07/2019 14:48, Mark Rutland wrote:
> On Mon, Jul 15, 2019 at 02:41:00PM +0100, Dave Martin wrote:
>> On Sat, Jul 13, 2019 at 05:53:57PM +0800, Guoheyi wrote:
>>> Do it make sense to implement virtual SDEI in qemu? So that we can have the
>>> standard way for guest to handle NMI watchdog, RAS events and something else
>>> which involves SDEI in a physical ARM64 machine.

I think so!


>>> My basic idea is like below:
>>>
>>> 1. Change a few lines of code in kvm to allow unhandled SMC invocations
>>> (like SDEI) to be sent to qemu, with exit reason of KVM_EXIT_HYPERCALL, so
>>> we don't need to add new API.
>>
>> So long as KVM_EXIT_HYPERCALL reports sufficient information so that
>> userspace can identify the cause as an SMC and retrieve the SMC
>> immediate field, this seems feasible.
>>
>> For its own SMCCC APIs, KVM exclusively uses HVC, so rerouting SMC to
>> userspace shouldn't conflict.
> 
> Be _very_ careful here! In systems without EL3 (and without NV), SMC
> UNDEFs rather than trapping to EL2. Given that, we shouldn't build a
> hypervisor ABI that depends on SMC.
> 
> I am strongly of the opinion that (for !NV) we should always use HVC
> here and have KVM appropriately forward calls to userspace, rather than
> trying to use HVC/SMC to distinguish handled-by-kernel and
> handled-by-userspace events.
> 
> For NV, the first guest hypervisor would use SMC to talk to KVM, all
> else being the same.
> 
>> This bouncing of SMCs to userspace would need to be opt-in, otherwise
>> old userspace would see exits that it doesn't know what to do with.
>>
>>> 2. qemu handles supported SDEI calls just as the spec says for what a
>>> hypervisor should do for a guest OS.

[pulled point 3 out to discuss separately]

>> Something like that.
>>
>> Interactions between SDEI and PSCI would need some thought: for example,
>> after PSCI_CPU_ON, the newly online cpu needs to have SDEs masked.
>>
>> One option (suggested to me by James Morse) would be to allow userspace
>> to disable in the in-kernel PSCI implementation and provide its own
>> PSCI to the guest via SMC -- in which case userspace that wants to
>> implement SDEI would have to implement PSCI as well.
> 
> I think this would be the best approach, since it puts userspace in
> charge of everything.
> 
> However, this interacts poorly with FW-based mitigations that we
> implement in hyp. I suspect we'd probably need a mechanism to delegate
> that responsibility back to the kernel, and figure out if that has any
> interaction with thigns that got punted to userspace...

This has come up before:
https://lore.kernel.org/r/59C139D0.3040507@xxxxxxx

I agree Qemu should opt-in to this, it needs to be a feature that is enabled.

I had an early version of something like this for testing SDEI before there was firmware
available. The review feedback from Christoffer was that it should include HVC and SMC,
their immediates, and shouldn't be tied to SMC-CC ranges.

I think this should be a catch-all as Heyi describes to deliver 'unhandled SMC/HVC' to
user-space as hypercall exits. We should include the immediate in the struct.

We can allow Qemu to disable the in-kernel PSCI implementation, which would let it be done
in user-space via this catch-all mechanism. (PSCI in user-space has come up on another
thread recently). The in-kernel PSCI needs to be default-on for backwards compatibility.

As Mark points out, the piece that's left is the 'arch workaround' stuff. We always need
to handle these in the kernel. I don't think these should be routed-back, they should be
un-obtainable by user-space.
Ideally there would be a way for user-space to discover the immediate/x0 values that the
kernel will not deliver via the catch-all. This could be done via some co-processor
register... thing...



>>> 3. For interrupts bound to hypervisor, qemu should stop injecting the IRQ to
>>> guest through KVM, but jump to the registered event handler directly,
>>> including context saving and restoring. Some interrupts like virtual timer
>>> are handled by kvm directly, so we may refuse to bind such interrupts to
>>> SDEI events.

I don't think we'd ever need a physical interrupt to be delivered via Qemu as an SDEI event.
The use-cases for this stuff mean it can be done 'higher-up'. For example, your timer is
probably used as a watchdog. On a real system this may well be a device with an interrupt,
but Qemu could happily emulate a watchdog using some other linux API. (e.g. SIGALRM).



Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux