On 22/11/17 23:00, Jim Mattson wrote:
On Tue, Nov 21, 2017 at 7:30 AM, Liran Alon <liran.alon@xxxxxxxxxx> wrote:
Being symmetrical, injecting an exception from user-mode should
consider injected exception as in "injected" state and not in
"pending" state.
I disagree with this contention. Suppose, for example, that we are
executing in a nested context (i.e. vmcs02 is live) and usermode
"injects" a page-fault. The page fault may be delivered on L2's IDT or
it may cause an emulated VM-exit from L2 to L1, depending on the page
fault error code, the exception bitmap in the vmcs12, and the
page-fault error-code mask and match fields in the vmcs12. If the page
fault is delivered on L2's IDT, then the exception can be considered
as "injected," with the CR2 side effect already processed. However, if
the page fault causes an emulated VM-exit from L2 to L1, then it must
be considered as "pending" and the CR2 side effect must not have been
processed.
I understand you are referring here to the FIXME comment that is present
on nested_vmx_check_exception()?
So, where do we find the new CR2 value? Admittedly, the existing
KVM_SET_VCPU_EVENTS API is broken and appears to assume that the side
effects have already been performed for exceptions injected from
userspace, though this assumption isn't documented AFAICT.
Fortunately, there's enough padding in the kvm_vcpu_events structure
to fix the API (with the possible exception of injected #VE*): one bit
to indicate that userspace is providing the side effects in the events
structure, and 64 bits for the new CR2 value (#PF) or the new DR6 bits
(#DB).
I see. Then what do you think about the following change:
1. Rename kvm_vcpu_events.exception.injected to
kvm_vcpu_events.exception.raised and remain still to be the logical OR
of "exception.pending | exception.injected" as of today (to not break
backwards compatibility).
2. Add a new flag to kvm_vcpu_events.flags to indicate if
kvm_vcpu_events.exception.raised is actually exception.pending or
exception.injected (they are mutually exclusive). A value of 0 will be
considered as "injected" to preserve backwards compatibility.
3. Use kvm_vcpu_events.reserved[0] and kvm_vcpu_events.reserved[1] for
exception_extra_info which will be either CR2 for #PF or DR6 for #DB.
4. Add to kvm_queued_exception() a u64 exception_extra_info that will
either be CR2 for #PF or DR6 for #DB. Make sure that these will be set
on relevant places and filled to vcpu.arch.cr2/VMCS only on
inject_pending_event() injection of a pending exception.
I think that in the above proposal, we don't need to be conditional on a
new capability because old user-space shouldn't be affected.
(They will still get same value in kvm_vcpu_events.exception.raised and
the rest were ignored fields).
What do you think?
Thanks for the excellent review!
-Liran
* One could argue that userspace should not be delivering a #VE
directly, but should be injecting an EPT Violation instead.