On Wed, Jan 20, 2021, Vitaly Kuznetsov wrote: > Igor Mammedov <imammedo@xxxxxxxxxx> writes: > > > On Tue, 19 Jan 2021 09:20:42 -0800 > > Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > >> > >> Were you planning on adding a capability to check for the new and improved > >> memslots limit, e.g. to know whether or not KVM might die on a large VM? > >> If so, requiring the VMM to call an ioctl() to set a higher (or lower?) limit > >> would be another option. That wouldn't have the same permission requirements as > >> a module param, but it would likely be a more effective safeguard in practice, > >> e.g. use cases with a fixed number of memslots or a well-defined upper bound > >> could use the capability to limit themselves. > > Currently QEMU uses KVM_CAP_NR_MEMSLOTS to get limit, and depending on place the > > limit is reached it either fails gracefully (i.e. it checks if free slot is > > available before slot allocation) or aborts (in case where it tries to allocate > > slot without check). > > FWIW, 'synic problem' causes it to abort. > > > New ioctl() seems redundant as we already have upper limit check > > (unless it would allow go over that limit, which in its turn defeats purpose of > > the limit). Gah, and I even looked for an ioctl(). No idea how I didn't find NR_MEMSLOTS. > Right, I didn't plan to add any new CAP as what we already have should > be enough to query the limits. Having an ioctl to set the upper limit > seems complicated: with device and CPU hotplug it may not be easy to > guess what it should be upfront so VMMs will likely just add a call to > raise the limit in memslot modification code so it won't be providing > any new protection. True. Maybe the best approach, if we want to make the limit configurable, would be to make a lower limit opt-in. I.e. default=KVM_CAP_NR_MEMSLOTS=SHRT_MAX, with an ioctl() to set a lower limit. That would also allow us to defer adding a new ioctl() until someone actually plans on using it. > >> Thoughts? An ioctl() feels a little over-engineered, but I suspect that adding > >> a module param that defaults to N*KVM_MAX_VPCUS will be a waste, e.g. no one > >> will ever touch the param and we'll end up with dead, rarely-tested code. > > Alternatively, we can hard-code KVM_USER_MEM_SLOTS to N*KVM_MAX_VPCUS so > no new parameter is needed but personally, I'd prefer to have it > configurable (in case we decide to set it to something lower than > SHRT_MAX of course) even if it won't be altered very often (which is a > good thing for 'general purpose' usage, right?). > > First, it will allow tightening the limit for some very specific deployments > (e.g. FaaS/ Firecracker-style) to say '20' which should be enough. A module param likely isn't usable for many such deployments though, as it would require a completely isolated pool of systems. That's why an ioctl() is appealing; the expected number of memslots is a property of the VM, not of the platform. > Second, we may be overlooking some configurations where the number of > memslots is actually dependent on the number of vCPUs but nobody complained > so far just because these configutrarions use a farly small number and the > ceiling wasn't hit yet. > > One more spare thought. Per-vCPU memslots may come handy if someone > decides to move some of the KVM PV features to userspace. E.g. I can > imagine an attempt to move async_pf out of kernel. Memslots aren't per-vCPU though, any userspace feature that tries to treat them as such will be flawed in some way. Practically speaking, per-vCPU memslots just aren't feasible, at least not without KVM changes that are likely unpalatable, e.g. KVM would need to incorporate the existence of a vCPU specific memslot into the MMU role.