On Tue, 19 Jan 2021 09:20:42 -0800 Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > On Mon, Jan 18, 2021, Vitaly Kuznetsov wrote: > > Sean Christopherson <seanjc@xxxxxxxxxx> writes: > > > > > On Fri, Jan 15, 2021, Vitaly Kuznetsov wrote: > > >> Memory slots are allocated dynamically when added so the only real > > >> limitation in KVM is 'id_to_index' array which is 'short'. Define > > >> KVM_USER_MEM_SLOTS to the maximum possible value in the arch-neutral > > >> include/linux/kvm_host.h, architectures can still overtide the setting > > >> if needed. > > > > > > Leaving the max number of slots nearly unbounded is probably a bad idea. If my > > > math is not completely wrong, this would let userspace allocate 6mb of kernel > > > memory per VM. Actually, worst case scenario would be 12mb since modifying > > > memslots temporarily has two allocations. > > > > Yea I had thought too but on the other hand, if your VMM went rogue and > > and is trying to eat all your memory, how is allocating 32k memslots > > different from e.g. creating 64 VMs with 512 slots each? We use > > GFP_KERNEL_ACCOUNT to allocate memslots (and other per-VM stuff) so > > e.g. cgroup limits should work ... > > I see it as an easy way to mitigate the damage. E.g. if a containers use case > is spinning up hundreds of VMs and something goes awry in the config, it would > be the difference between consuming tens of MBs and hundreds of MBs. Cgroup > limits should also be in play, but defense in depth and all that. > > > > If we remove the arbitrary limit, maybe replace it with a module param with a > > > sane default? > > > > This can be a good solution indeed. The only question then is what should > > we pick as the default? It seems to me this can be KVM_MAX_VCPUS > > dependent, e.g. 4 x KVM_MAX_VCPUS would suffice. > > Hrm, I don't love tying it to KVM_MAX_VPUCS. Other than SynIC, are there any > other features/modes/configuration that cause the number of memslots to grop (NV)DIMMs in QEMU also consume slot/device but do not depend on vCPUs number. Due current slot limit only 256 DIMMs are allowed. But if vCPUs start consuming extra memslots, they will contend over possible slots. > with the number of vCPUs? But, limiting via a module param does effectively > require using KVM_MAX_VCPUS, otherwise everyone that might run Windows guests > would have to override the default and thereby defeat the purpose of limiting by > default. > > Were you planning on adding a capability to check for the new and improved > memslots limit, e.g. to know whether or not KVM might die on a large VM? > If so, requiring the VMM to call an ioctl() to set a higher (or lower?) limit > would be another option. That wouldn't have the same permission requirements as > a module param, but it would likely be a more effective safeguard in practice, > e.g. use cases with a fixed number of memslots or a well-defined upper bound > could use the capability to limit themselves. Currently QEMU uses KVM_CAP_NR_MEMSLOTS to get limit, and depending on place the limit is reached it either fails gracefully (i.e. it checks if free slot is available before slot allocation) or aborts (in case where it tries to allocate slot without check). New ioctl() seems redundant as we already have upper limit check (unless it would allow go over that limit, which in its turn defeats purpose of the limit). > Thoughts? An ioctl() feels a little over-engineered, but I suspect that adding > a module param that defaults to N*KVM_MAX_VPCUS will be a waste, e.g. no one > will ever touch the param and we'll end up with dead, rarely-tested code. >