Re: Interface to enable in-kernel hcall handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19.11.2013, at 02:02, Paul Mackerras <paulus@xxxxxxxxx> wrote:

> On Mon, Nov 18, 2013 at 04:31:39PM -0500, Alexander Graf wrote:
>> 
>> On 16.11.2013, at 03:59, Paul Mackerras <paulus@xxxxxxxxx> wrote:
>> 
>>> I have been thinking about adding an interface to PPC KVM's PAPR
>>> emulation to allow userspace to control whether or not individual
>>> hypercalls or groups of hypercalls get handled in the kernel
>>> (vs. being passed up to userspace to be handled there).
>>> 
>>> I can think of a couple of possible interfaces, differing in how the
>>> set of hypercalls to be enabled/disabled is specified.  In each case I
>>> envisage a new VM ioctl which takes an argument specifying which
>>> hypercalls to enable, and possibly another VM ioctl to disable some or
>>> all hypercalls.
>>> 
>>> One is to use the string defined in PAPR for the group of hypercalls.
>>> This is the string that gets included in the ibm,hypertas-functions
>>> property in the /rtas node of the device tree to indicate to the guest
>>> that the group of hypercalls is available to it, for example,
>>> "hcall-pft" for H_ENTER, H_REMOVE, etc., "hcall-tce" for H_PUT_TCE,
>>> H_GET_TCE and friends, and so on.  This way, userspace can iterate
>>> through the strings in the ibm,hypertas-functions property and call
>>> the enable-hypercall ioctl for each one.
>>> 
>>> The second is to pass the individual hypercall number and do them one
>>> by one.  The problem with this one is that it may not make sense to
>>> have some of the hypercalls in a related group handled in the kernel
>>> and others in userspace.
>>> 
>>> The third is to pass a bitmap with one bit per possible hypercall.
>>> 
>>> Any thoughts/opinions on the relative merits of these ideas?
>> 
>> I think either way works. I personally like the string variant the least, as it means we have to parse strings in the kernel. The question whether user space thinks it makes sense to only intercept groups versus individual hypercalls IMHO is not up to us. Maybe user space wants to accelerate H_GET_TCE, but intercept H_PUT_TCE to do magic in the background.
> 
> The problem with splitting a group of related hypercalls between
> kernel and userspace tends to be locking.  Generally there would be
> some data structure that is accessed by the hypercalls in the group.
> If all the hypercalls in the group are implemented in one place then
> you can manage concurrent access using the usual primitives (spinlocks
> or pthread mutexes, typically).  But if some are done in one place and
> some in another then the locking gets way more complex.  I'd prefer to
> avoid that extra complexity.

I agree, but I don't see how the two things are related. If we allow user space to actually emulate a hypercall, we need to make sure that the locking bits work good enough to at least prevent privilege escalation either way:

Imagine I want to trap H_PUT_TCE, but not H_GET_TCE. I would use the migration protocol to inject TCE entries when I trap to user space. Locking here needs to work regardless of whether it happens on H_PUT_TCE or any other guest triggered code path.

So the worst case that can happen by not handling things in-kernel in group granularities is that user space could shoot itself in the foot because it simply can't emulate that functionality. I don't see why we'd have to care.

I'd rather keep the interface flexible enough to allow weird use cases rather than try to be too smart and keep people from doing fun things :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux