Re: [RFC PATCH v2 1/1] kvm: Add documentation and ABI/API header for VM introspection

Mihai Donțu <mdontu@xxxxxxxxxxxxxxx> · Thu, 27 Jul 2017 20:06:25 +0300

On Fri, 2017-07-07 at 18:52 +0200, Paolo Bonzini wrote:
> > +2. KVMI_GET_GUEST_INFO
> > +----------------------
> > +
> > +:Architectures: all
> > +:Versions: >= 1
> > +:Parameters: {}
> > +:Returns: ↴
> > +
> > +::
> > +
> > > > +	struct kvmi_get_guest_info_reply {
> > > > +		__s32 err;
> > > > +		__u16 vcpu_count;
> > > > +		__u16 padding;
> > > > +		__u64 tsc_speed;
> > > > +	};
> > +
> > +Returns the number of online vcpus, and the TSC frequency in HZ, if supported
> > +by the architecture (otherwise is 0).
> 
> Can the TSC frequency be specified only if the guest is using TSC
> scaling?  Defining the TSC frequency on older hosts is a bit tricky.  0
> would be the host.
> 
> Maybe define the second padding to be
> 
> 	__u16 arch;
> 
> (0 = x86) followed by an arch-specific payload.

We use the TSC to compute some performance numbers, but only when we're
debugging reported issues. Should be OK if the TSC information is not
available. We'll manage in some other way.

> > +10. KVMI_GET_XSAVE_INFO
> > +-----------------------
> > +
> > +:Architectures: x86
> > +:Versions: >= 1
> > +:Parameters: ↴
> > +
> > +::
> > +
> > +	struct kvmi_xsave_info {
> > +		__u16 vcpu;
> > +		__u16 padding[3];
> > +	};
> > +
> > +:Returns: ↴
> > +
> > +::
> > +
> > +	struct kvmi_xsave_info_reply {
> > +		__s32 err;
> > +		__u32 size;
> > +	};
> > +
> > +Returns the xstate size for the specified vCPU.
> 
> Could this be replaced by a generic CPUID accessor?

Absolutely. I wonder if we should only made certain leafs acessible,
part of the whole "make the least amount of information available"
security philosophy, though I'm not aware of any attacks that can be
mounted on the host just by knowing too many things about a guest.

> > +11. KVMI_GET_PAGE_ACCESS
> > +------------------------
> > +
> > +:Architectures: all
> > +:Versions: >= 1
> > +:Parameters: ↴
> > +
> > +::
> > +
> > +	struct kvmi_get_page_access {
> > +		__u16 vcpu;
> > +		__u16 padding[3];
> > +		__u64 gpa;
> > +	};
> > +
> > +:Returns: ↴
> > +
> > +::
> > +
> > +	struct kvmi_get_page_access_reply {
> > +		__s32 err;
> > +		__u32 access;
> > +	};
> > +
> > +Returns the spte flags (rwx - present, write & user) for the specified
> > +vCPU and guest physical address.
> > +
> > +12. KVMI_SET_PAGE_ACCESS
> > +------------------------
> > +
> > +:Architectures: all
> > +:Versions: >= 1
> > +:Parameters: ↴
> > +
> > +::
> > +
> > +	struct kvmi_set_page_access {
> > +		__u16 vcpu;
> > +		__u16 padding;
> > +		__u32 access;
> > +		__u64 gpa;
> > +	};
> > +
> > +:Returns: ↴
> > +
> > +::
> > +
> > +	struct kvmi_error_code {
> > +		__s32 err;
> > +		__u32 padding;
> > +	};
> > +
> > +Sets the spte flags (rwx - present, write & user) - access - for the specified
> > +vCPU and guest physical address.
> 
> rwx or pwu?  I suppose RWX.  Maybe #define the constants in the
> documentation.
> 
> Also, it should be noted here that the spte flags are ANDed with
> whatever is provided by userspace, because there could be readonly
> memslots, and that some access combinations could fail (--x) or will
> surely fail (-wx for example).

We are closely looking into how KVM's MMU works and see how we'd go
about extending it so as to make everyone happy (qemu and a possible
introspection tool).

As for the permitted page access flags, we have (thus far) kept away
from invalid combinations on older arch-es (--x on pre-SandyBridge, for
example). However, it would be very nice to have arch-specific code
that does a validation ahead of time (ie. before the guest is re-
entered) so that an introspection tool can use an alternative approach.

> > +13. KVMI_INJECT_PAGE_FAULT
> > +--------------------------
> > +
> > +:Architectures: x86
> > +:Versions: >= 1
> > +:Parameters: ↴
> > +
> > +::
> > +
> > +	struct kvmi_page_fault {
> > +		__u16 vcpu;
> > +		__u16 padding;
> > +		__u32 error;
> 
> error_code, I guess?  Why not a generic inject exception message?

OK, we will rework the interface to be used for generic exception
injection. Maybe we can fold the breakpoint injection into it too.

> > +		__u64 gva;
> > +	};
> > +
> > +:Returns: ↴
> > +
> > +::
> > +
> > +	struct kvmi_error_code {
> > +		__s32 err;
> > +		__u32 padding;
> > +	};
> > +
> > +Injects a vCPU page fault with the specified guest virtual address and
> > +error code.
> > +
> > +5. KVMI_EVENT_USER_CALL
> > +-----------------------
> 
> Please rename this to KVMI_EVENT_HCALL or HYPERCALL or VMCALL.
> 
> > +
> > +:Architectures: x86
> > +:Versions: >= 1
> > +:Parameters: ↴
> > +
> > +::
> > +
> > +	struct kvmi_event_x86;
> > +
> > +:Returns: ↴
> > +
> > +::
> > +
> > +	struct kvmi_event_x86_reply;
> > +
> > +This event is sent on a user hypercall and the introspection has already
> > +already been enabled for this kind of event (KVMI_CONTROL_EVENTS).
> > +
> > +kvmi_event_x86 is sent to the introspector, which can respond with the
> > +KVMI_EVENT_ACTION_SET_REGS bit set in 'actions', instructing the host
> > +kernel to override the general purpose registers using the values from
> > +introspector (regs).
> 
> Does KVMI_EVENT_ACTION_SET_REGS bypass the hypercall?  Why is
> KVMI_EVENT_ACTION_ALLOW not allowed?  As before, I'd prefer
> SKIP/RETRY/ALLOW/CRASH as the actions.

No, we use SET_REGS only as a means to communicate with the guest code
that did the hypercall. We don't specify an action because, in our
specific case, saw no use for it.

Apropos: if possible, we'd like to keep the Xen convention for this
hypercall. It doesn't appear to interfere with either KVM or Hyper-V
(rax = 34 [__HYPERVISOR_hvm_op], rbx/rdi = 24
[HVMOP_guest_request_vm_event] - depends on wether long mode is
enabled).

> > +7. KVMI_EVENT_TRAP
> > +------------------
> > +
> > +:Architectures: x86
> > +:Versions: >= 1
> > +:Parameters: ↴
> > +
> > +::
> > +
> > +	struct kvmi_event_x86;
> > +	struct kvmi_event_trap {
> > +		__u32 vector;
> > +		__u32 type;
> > +		__u32 err;
> 
> error_code?
> 
> > +		__u32 padding;
> > +		__u64 cr2;
> > +	};
> > +
> > +:Returns: ↴
> > +
> > +::
> > +
> > +	struct kvmi_event_x86_reply;
> > +
> > +This event is sent if a trap will be delivered to the guest (page fault,
> > +breakpoint, etc.) and the introspection has already enabled the reports
> > +for this kind of event (KVMI_CONTROL_EVENTS).
> > +
> > +This is used to inform the introspector of all pending traps giving it
> > +a chance to determine if it should try again later in case a previous
> > +KVMI_INJECT_PAGE_FAULT/KVMI_INJECT_BREAKPOINT command has been overwritten
> > +by an interrupt picked up during guest reentry.
> > +
> > +kvmi_event_x86, exception/interrupt number (vector), exception/interrupt
> > +type, exception code (err) and CR2 are sent to the introspector, which can
> > +respond with the KVMI_EVENT_ACTION_SET_REGS bit set in 'actions', instructing
> > +the host kernel to override the general purpose registers using the values
> > +from introspector (regs).
> 
> Here I think the actions could be RETRY/ALLOW/CRASH only, with "set
> regs" done as a separate command.
> 
> Some x86 exceptions are faults, others are traps.  Traps should not
> allow RETRY as an action.  There should be an input telling the
> introspector if retrying is allowed.
> 
> How are dr6/dr7 handled around breakpoints?

Afaik, the debug registers should remain untouched, but I will ask my
colleagues with better knowledge of this.

Thanks,

-- 
Mihai Donțu