On 27/11/17 19:26, Jim Mattson wrote:
I think we need some way for userspace to indicate whether or not it
can deal with pending events (with side effects recorded in
kvm_vcpu_events.reserved[0] and kvm_vcpu_events.reserved[1]) when it
issues a KVM_GET_VCPU_EVENTS ioctl. That seems to argue for a new flag
bit in kvm_vcpu_events.flags (as input to KVM_GET_VCPU_EVENTS) and a
capability indicating that the flag can be set by userspace.
Yep I agree. I will do the relevant changes for the next version of this
series.
Thanks,
-Liran
On Thu, Nov 23, 2017 at 10:45 AM, Liran Alon <LIRAN.ALON@xxxxxxxxxx> wrote:
On 22/11/17 23:00, Jim Mattson wrote:
On Tue, Nov 21, 2017 at 7:30 AM, Liran Alon <liran.alon@xxxxxxxxxx> wrote:
Being symmetrical, injecting an exception from user-mode should
consider injected exception as in "injected" state and not in
"pending" state.
I disagree with this contention. Suppose, for example, that we are
executing in a nested context (i.e. vmcs02 is live) and usermode
"injects" a page-fault. The page fault may be delivered on L2's IDT or
it may cause an emulated VM-exit from L2 to L1, depending on the page
fault error code, the exception bitmap in the vmcs12, and the
page-fault error-code mask and match fields in the vmcs12. If the page
fault is delivered on L2's IDT, then the exception can be considered
as "injected," with the CR2 side effect already processed. However, if
the page fault causes an emulated VM-exit from L2 to L1, then it must
be considered as "pending" and the CR2 side effect must not have been
processed.
I understand you are referring here to the FIXME comment that is present on
nested_vmx_check_exception()?
So, where do we find the new CR2 value? Admittedly, the existing
KVM_SET_VCPU_EVENTS API is broken and appears to assume that the side
effects have already been performed for exceptions injected from
userspace, though this assumption isn't documented AFAICT.
Fortunately, there's enough padding in the kvm_vcpu_events structure
to fix the API (with the possible exception of injected #VE*): one bit
to indicate that userspace is providing the side effects in the events
structure, and 64 bits for the new CR2 value (#PF) or the new DR6 bits
(#DB).
I see. Then what do you think about the following change:
1. Rename kvm_vcpu_events.exception.injected to
kvm_vcpu_events.exception.raised and remain still to be the logical OR of
"exception.pending | exception.injected" as of today (to not break backwards
compatibility).
2. Add a new flag to kvm_vcpu_events.flags to indicate if
kvm_vcpu_events.exception.raised is actually exception.pending or
exception.injected (they are mutually exclusive). A value of 0 will be
considered as "injected" to preserve backwards compatibility.
3. Use kvm_vcpu_events.reserved[0] and kvm_vcpu_events.reserved[1] for
exception_extra_info which will be either CR2 for #PF or DR6 for #DB.
4. Add to kvm_queued_exception() a u64 exception_extra_info that will either
be CR2 for #PF or DR6 for #DB. Make sure that these will be set on relevant
places and filled to vcpu.arch.cr2/VMCS only on inject_pending_event()
injection of a pending exception.
I think that in the above proposal, we don't need to be conditional on a new
capability because old user-space shouldn't be affected.
(They will still get same value in kvm_vcpu_events.exception.raised and the
rest were ignored fields).
What do you think?
Thanks for the excellent review!
-Liran
* One could argue that userspace should not be delivering a #VE
directly, but should be injecting an EPT Violation instead.