I think we need some way for userspace to indicate whether or not it can deal with pending events (with side effects recorded in kvm_vcpu_events.reserved[0] and kvm_vcpu_events.reserved[1]) when it issues a KVM_GET_VCPU_EVENTS ioctl. That seems to argue for a new flag bit in kvm_vcpu_events.flags (as input to KVM_GET_VCPU_EVENTS) and a capability indicating that the flag can be set by userspace. On Thu, Nov 23, 2017 at 10:45 AM, Liran Alon <LIRAN.ALON@xxxxxxxxxx> wrote: > > > On 22/11/17 23:00, Jim Mattson wrote: >> >> On Tue, Nov 21, 2017 at 7:30 AM, Liran Alon <liran.alon@xxxxxxxxxx> wrote: >> >>> Being symmetrical, injecting an exception from user-mode should >>> consider injected exception as in "injected" state and not in >>> "pending" state. >> >> >> I disagree with this contention. Suppose, for example, that we are >> executing in a nested context (i.e. vmcs02 is live) and usermode >> "injects" a page-fault. The page fault may be delivered on L2's IDT or >> it may cause an emulated VM-exit from L2 to L1, depending on the page >> fault error code, the exception bitmap in the vmcs12, and the >> page-fault error-code mask and match fields in the vmcs12. If the page >> fault is delivered on L2's IDT, then the exception can be considered >> as "injected," with the CR2 side effect already processed. However, if >> the page fault causes an emulated VM-exit from L2 to L1, then it must >> be considered as "pending" and the CR2 side effect must not have been >> processed. > > I understand you are referring here to the FIXME comment that is present on > nested_vmx_check_exception()? >> >> >> So, where do we find the new CR2 value? Admittedly, the existing >> KVM_SET_VCPU_EVENTS API is broken and appears to assume that the side >> effects have already been performed for exceptions injected from >> userspace, though this assumption isn't documented AFAICT. >> Fortunately, there's enough padding in the kvm_vcpu_events structure >> to fix the API (with the possible exception of injected #VE*): one bit >> to indicate that userspace is providing the side effects in the events >> structure, and 64 bits for the new CR2 value (#PF) or the new DR6 bits >> (#DB). > > I see. Then what do you think about the following change: > 1. Rename kvm_vcpu_events.exception.injected to > kvm_vcpu_events.exception.raised and remain still to be the logical OR of > "exception.pending | exception.injected" as of today (to not break backwards > compatibility). > 2. Add a new flag to kvm_vcpu_events.flags to indicate if > kvm_vcpu_events.exception.raised is actually exception.pending or > exception.injected (they are mutually exclusive). A value of 0 will be > considered as "injected" to preserve backwards compatibility. > 3. Use kvm_vcpu_events.reserved[0] and kvm_vcpu_events.reserved[1] for > exception_extra_info which will be either CR2 for #PF or DR6 for #DB. > 4. Add to kvm_queued_exception() a u64 exception_extra_info that will either > be CR2 for #PF or DR6 for #DB. Make sure that these will be set on relevant > places and filled to vcpu.arch.cr2/VMCS only on inject_pending_event() > injection of a pending exception. > > I think that in the above proposal, we don't need to be conditional on a new > capability because old user-space shouldn't be affected. > (They will still get same value in kvm_vcpu_events.exception.raised and the > rest were ignored fields). > > What do you think? > > Thanks for the excellent review! > -Liran > >> >> * One could argue that userspace should not be delivering a #VE >> directly, but should be injecting an EPT Violation instead. >> >