On Thu, Feb 18, 2021 at 14:31:00 +0100, Michal Privoznik wrote: > The virtio-mem is paravirtualized mechanism of adding/removing > memory to/from a VM. A virtio-mem-pci device is split into blocks > of equal size which are then exposed (all or only a requested > portion of them) to the guest kernel to use as regular memory. > Therefore, the device has two important attributes: > > 1) block-size, which defines the size of a block > 2) requested-size, which defines how much memory (in bytes) > is the device requested to expose to the guest. > > The 'block-size' is configured on command line and immutable > throughout device's lifetime. The 'requested-size' can be set on > the command line too, but also is adjustable via monitor. In > fact, that is how management software places its requests to > change the memory allocation. If it wants to give more memory to > the guest it changes 'requested-size' to a bigger value, and if it > wants to shrink guest memory it changes the 'requested-size' to a > smaller value. Note, value of zero means that guest should > release all memory offered by the device. Of course, guest has to > cooperate. Therefore, there is a third attribute 'size' which is > read only and reflects how much memory the guest still has. This > can be different to 'requested-size', obviously. Because of name > clash, I've named it 'actualsize' and it is dealt with in future > commits (it is a runtime information anyway). > > In the backend, memory for virtio-mem is backed by usual objects: > memory-backend-{ram,file,memfd} and their size puts the cap on > the amount of memory that a virtio-mem device can offer to a > guest. But we are already able to express this info using <size/> > under <target/>. > > Therefore, we need only two more elements to cover 'block-size' > and 'requested-size' attributes. This is the XML I've came up > with: > > <memory model='virtio-mem'> > <source> > <nodemask>1-3</nodemask> > <pagesize unit='KiB'>2048</pagesize> > </source> > <target> > <size unit='KiB'>2097152</size> > <node>0</node> > <block unit='KiB'>2048</block> > <requested unit='KiB'>1048576</requested> > </target> > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> > </memory> > > I hope by now it is obvious that: > > 1) 'requested-size' must be an integer multiple of > 'block-size', and > 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI > address. > > Then there is a limitation that the minimal 'block-size' is > transparent huge page size (I'll leave this without explanation). > > Since now we have (possibly) two or more devices that allow > memory inflation/deflation and accounting for all of them (and > thus keeping <currentMemory/> updated) might be hard. Therefore, > I'm deliberately forbidding memballoon. It's okay - virtio-mem is > superior to memballoon anyway. We can always reevaluate later. > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > --- > docs/formatdomain.rst | 51 +++++++++++++-- > docs/schemas/domaincommon.rng | 11 ++++ > src/conf/domain_conf.c | 53 ++++++++++++++- > src/conf/domain_conf.h | 3 + > src/conf/domain_validate.c | 39 +++++++++++ > src/qemu/qemu_alias.c | 1 + > src/qemu/qemu_command.c | 1 + > src/qemu/qemu_domain.c | 27 +++++++- > src/qemu/qemu_domain_address.c | 38 ++++++++--- > src/qemu/qemu_process.c | 2 + > src/qemu/qemu_validate.c | 22 ++++++- > src/security/security_apparmor.c | 1 + > src/security/security_dac.c | 2 + > src/security/security_selinux.c | 2 + > .../memory-hotplug-virtio-mem.xml | 64 +++++++++++++++++++ > ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 + > tests/qemuxml2xmltest.c | 1 + > 17 files changed, 298 insertions(+), 21 deletions(-) > create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml > create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml > > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst > index 2587106191..05d359a32d 100644 > --- a/docs/formatdomain.rst > +++ b/docs/formatdomain.rst [...] > @@ -6935,6 +6937,16 @@ Example: automatically added device with KVM > </devices> > ... > > +Example: automatically added device with QEMU/KVM and a ``virtio-mem`` device: > + > +:: > + > + ... > + <devices> The example is missing the 'virtio-mem' device which causes the balloon to be disabled. > + <memballoon model='none'/> > + </devices> > + ... > + > Example: manually added device with static PCI slot 2 requested > > :: [...] > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c > index 541d592bbe..26b9b5583e 100644 > --- a/src/qemu/qemu_domain.c > +++ b/src/qemu/qemu_domain.c > @@ -3679,10 +3679,21 @@ qemuDomainDefAddDefaultDevices(virDomainDefPtr def, > } > > if (addDefaultMemballoon && !def->memballoon) { > - virDomainMemballoonDefPtr memballoon; > - memballoon = g_new0(virDomainMemballoonDef, 1); > + virDomainMemballoonDefPtr memballoon = g_new0(virDomainMemballoonDef, 1); > + size_t i; > > - memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO; > + /* To simplify virtio-mem implementation, memballoon has to be turned > + * off if domain has a virtio-mem device. See > + * qemuValidateDomainDeviceDefMemory() for more details. */ > + for (i = 0; i < def->nmems; i++) { > + if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) > + break; > + } > + > + if (i == def->nmems) > + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO; > + else > + memballoon->model = VIR_DOMAIN_MEMBALLOON_MODEL_NONE; > def->memballoon = memballoon; > } This might be a point of contention. I can see why we'd want to do this but people might have other ideas. Preferably I'd like danpb's input. I think the best approach will be to not hide it into the massive patch adding the boring XML bits but split it separately. [...] > diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c > index 2541ae856a..6150083251 100644 > --- a/src/qemu/qemu_validate.c > +++ b/src/qemu/qemu_validate.c [...] > @@ -4688,6 +4689,25 @@ qemuValidateDomainDeviceDefMemory(virDomainMemoryDefPtr mem, > } > break; > > + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM: > + /* Accounting balloon and virtio-mem is hard. We have plenty of APIs > + * which take balloon from QEMU and report it to users. We would have > + * to change all that and account for virtio-mem actual size. Also, > + * virtio-mem is supposed to be replacement for balloon. Disable > + * coexistence of these two for now. */ > + if (virDomainDefHasMemballoon(def)) { > + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", > + _("virtio-mem is not supported with memory balloon")); > + return -1; > + } > + > + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI)) { > + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", > + _("virtio-mem isn't supported by this QEMU binary")); > + return -1; > + } > + break; And this one too. I agree with this hunk as is, but not in this commit.