Re: [PATCH RFC 3/4] KVM: Define KVM_USER_MEM_SLOTS in arch-neutral include/linux/kvm_host.h

Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> · Wed, 20 Jan 2021 13:02:10 +0100

Igor Mammedov <imammedo@xxxxxxxxxx> writes:

> On Tue, 19 Jan 2021 09:20:42 -0800
> Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
>> 
>> Were you planning on adding a capability to check for the new and improved
>> memslots limit, e.g. to know whether or not KVM might die on a large VM?
>> If so, requiring the VMM to call an ioctl() to set a higher (or lower?) limit
>> would be another option.  That wouldn't have the same permission requirements as
>> a module param, but it would likely be a more effective safeguard in practice,
>> e.g. use cases with a fixed number of memslots or a well-defined upper bound
>> could use the capability to limit themselves.
> Currently QEMU uses KVM_CAP_NR_MEMSLOTS to get limit, and depending on place the
> limit is reached it either fails gracefully (i.e. it checks if free slot is
> available before slot allocation) or aborts (in case where it tries to allocate
> slot without check).

FWIW, 'synic problem' causes it to abort.

> New ioctl() seems redundant as we already have upper limit check
> (unless it would allow go over that limit, which in its turn defeats purpose of
> the limit).
>

Right, I didn't plan to add any new CAP as what we already have should
be enough to query the limits. Having an ioctl to set the upper limit
seems complicated: with device and CPU hotplug it may not be easy to
guess what it should be upfront so VMMs will likely just add a call to
raise the limit in memslot modification code so it won't be providing
any new protection.

>> Thoughts?  An ioctl() feels a little over-engineered, but I suspect that adding
>> a module param that defaults to N*KVM_MAX_VPCUS will be a waste, e.g. no one
>> will ever touch the param and we'll end up with dead, rarely-tested code.

Alternatively, we can hard-code KVM_USER_MEM_SLOTS to N*KVM_MAX_VPCUS so
no new parameter is needed but personally, I'd prefer to have it
configurable (in case we decide to set it to something lower than
SHRT_MAX of course) even if it won't be altered very often (which is a
good thing for 'general purpose' usage, right?). First, it will allow
tightening the limit for some very specific deployments (e.g. FaaS/
Firecracker-style) to say '20' which should be enough. Second, we may be
overlooking some configurations where the number of memslots is actually
dependent on the number of vCPUs but nobody complained so far just
because these configutrarions use a farly small number and the ceiling
wasn't hit yet.

One more spare thought. Per-vCPU memslots may come handy if someone
decides to move some of the KVM PV features to userspace. E.g. I can
imagine an attempt to move async_pf out of kernel.

-- 
Vitaly