Re: [PATCH RFC 3/4] KVM: Define KVM_USER_MEM_SLOTS in arch-neutral include/linux/kvm_host.h

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 19 Jan 2021 09:20:42 -0800

On Mon, Jan 18, 2021, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
> 
> > On Fri, Jan 15, 2021, Vitaly Kuznetsov wrote:
> >> Memory slots are allocated dynamically when added so the only real
> >> limitation in KVM is 'id_to_index' array which is 'short'. Define
> >> KVM_USER_MEM_SLOTS to the maximum possible value in the arch-neutral
> >> include/linux/kvm_host.h, architectures can still overtide the setting
> >> if needed.
> >
> > Leaving the max number of slots nearly unbounded is probably a bad idea.  If my
> > math is not completely wrong, this would let userspace allocate 6mb of kernel
> > memory per VM.  Actually, worst case scenario would be 12mb since modifying
> > memslots temporarily has two allocations.
> 
> Yea I had thought too but on the other hand, if your VMM went rogue and
> and is trying to eat all your memory, how is allocating 32k memslots
> different from e.g. creating 64 VMs with 512 slots each? We use
> GFP_KERNEL_ACCOUNT to allocate memslots (and other per-VM stuff) so
> e.g. cgroup limits should work ...

I see it as an easy way to mitigate the damage.  E.g. if a containers use case
is spinning up hundreds of VMs and something goes awry in the config, it would
be the difference between consuming tens of MBs and hundreds of MBs.  Cgroup
limits should also be in play, but defense in depth and all that. 

> > If we remove the arbitrary limit, maybe replace it with a module param with a
> > sane default?
> 
> This can be a good solution indeed. The only question then is what should
> we pick as the default? It seems to me this can be KVM_MAX_VCPUS
> dependent, e.g. 4 x KVM_MAX_VCPUS would suffice.

Hrm, I don't love tying it to KVM_MAX_VPUCS.  Other than SynIC, are there any
other features/modes/configuration that cause the number of memslots to grop
with the number of vCPUs?  But, limiting via a module param does effectively
require using KVM_MAX_VCPUS, otherwise everyone that might run Windows guests
would have to override the default and thereby defeat the purpose of limiting by
default.

Were you planning on adding a capability to check for the new and improved
memslots limit, e.g. to know whether or not KVM might die on a large VM?
If so, requiring the VMM to call an ioctl() to set a higher (or lower?) limit
would be another option.  That wouldn't have the same permission requirements as
a module param, but it would likely be a more effective safeguard in practice,
e.g. use cases with a fixed number of memslots or a well-defined upper bound
could use the capability to limit themselves.

Thoughts?  An ioctl() feels a little over-engineered, but I suspect that adding
a module param that defaults to N*KVM_MAX_VPCUS will be a waste, e.g. no one
will ever touch the param and we'll end up with dead, rarely-tested code.