On Fri, Apr 23, 2021 at 15:24:36 +0200, Michal Privoznik wrote: > This commit adds new memorydevices.rst page which should serve > all models of memory devices. Yet, I'm documenting virtio-mem > quirks only. > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > --- > docs/kbase/index.rst | 4 + > docs/kbase/memorydevices.rst | 150 +++++++++++++++++++++++++++++++++++ > docs/kbase/meson.build | 1 + > 3 files changed, 155 insertions(+) > create mode 100644 docs/kbase/memorydevices.rst > > diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst > index 532804fe05..45450bf33b 100644 > --- a/docs/kbase/index.rst > +++ b/docs/kbase/index.rst > @@ -46,6 +46,10 @@ Usage > `PCI topology <../pci-addresses.html>`__ > Addressing schemes for PCI devices > > +`Memory devices <memorydevices.html>`__ > + Memory devices and their use > + > + > Internals / Debugging > --------------------- > > diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst > new file mode 100644 > index 0000000000..5c4c45a77f > --- /dev/null > +++ b/docs/kbase/memorydevices.rst > @@ -0,0 +1,150 @@ > +============== > +Memory devices > +============== > + > +.. contents:: > + > +Basics > +====== > + > +Memory devices can be divided into two families: DIMMs and NVDIMMs. The former DIMMs/NVDIMMs description is misleading since you then put virtio-mem under dimm. I'd suggest you use 'volatile' and 'non-volatile' > +is typical RAM memory: it's volatile and thus its contents doesn't survive > +reboots nor guest shut downs and power ons. The latter retains its contents > +across reboots or power outages. > + > +In Libvirt, there are two models for DIMMs: > + > +* ``dimm`` model: > + > + :: > + > + <memory model='dimm'> > + <target> > + <size unit='KiB'>523264</size> > + <node>0</node> > + </target> > + <address type='dimm' slot='0'/> > + </memory> > + > +* ``virtio-mem`` model: > + > + :: > + > + <memory model='virtio-mem'> > + <target> > + <size unit='KiB'>1048576</size> > + <node>0</node> > + <block unit='KiB'>2048</block> > + <requested unit='KiB'>524288</requested> > + </target> > + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> > + </memory> > + > +Then there are two models for NVDIMMs: > + > +* ``nvidmm`` model: > + > + :: > + > + <memory model='nvdimm'> > + <source> > + <path>/tmp/nvdimm</path> > + </source> > + <target> > + <size unit='KiB'>523264</size> > + <node>0</node> > + </target> > + <address type='dimm' slot='0'/> > + </memory> > + > +* ``virtio-pmem`` model: > + > + :: > + > + <memory model='virtio-pmem' access='shared'> > + <source> > + <path>/tmp/virtio_pmem</path> > + </source> > + <target> > + <size unit='KiB'>524288</size> > + </target> > + <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> > + </memory> > + > + > +Please note that (maybe somewhat surprisingly) virtio models go onto PCI bus > +instead of DIMM slots. This weird note won't be needed if you clarify the volatility in the first place. You can then mention the differences for each device type along with advantages etc ... > + > +Furthermore, DIMMs can have ``<source/>`` element which configures backend for > +devices. For NVDIMMs the element is mandatory and reflects where the contents > +is saved. > + > +See `memory devices documentation <../formatdomain.html#elementsMemory>`_. > + > +``virtio-mem`` model > +==================== > + > +The ``virtio-mem`` model can be viewed as revised memory balloon. It offers IMO it's closer to better memory hotplug with semantics closer to the balloon > +adding and removing memory (without the actual hotplug of the device). It > +solves problems that memory balloon can't solve on its own and thus is more > +flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, and thus > +memory can be inflated/deflated only for a subset of guest NUMA nodes. Also, > +it works with chunks that are either exposed to guest or taken back from it. > + > +See https://virtio-mem.gitlab.io/ > + > +Under the hood, ``virtio-mem`` device is split into chunks of equal size which > +are then exposed to the guest. Either all of them or only a portion depending > +on user's request. Therefore there are three important sizes for > +``virtio-mem``. All are to be found under ``<target/>`` element: > + > +#. The maximum size the device can ever offer, exposed under ``<size/>`` > +#. The size of a single block, exposed under ``<block/>`` > +#. The current size exposed to the guest, exposed under ``<requested/>`` > + > +For instance, the following example the maximum size is 4GiB, the block size is > +2MiB and only 1GiB should be exposed to the guest: > + > + :: > + > + <memory model='virtio-mem'> > + <target> > + <size unit='KiB'>4194304</size> > + <block unit='KiB'>2048</block> > + <requested unit='KiB'>1048576</requested> > + </target> > + </memory> > + > +Please note that ``<requested/>`` must be an integer multiple of ``<block/>`` > +size or zero (no blocks exposed to the guest) and has to be less or equal to > +``<size/>`` (all blocks exposed to the guest). Furthermore, QEMU recommends the > +``<block/>`` size to be as big as a Transparent Huge Page (usually 2MiB). > + > +To change the size exposed to the guest, users should pass memory device XML > +with nothing but ``<requested/>`` changed into the > +``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done > +via virsh too: > + > + :: > + > + # virsh update-memory-device $dom --requested size 2GiB --requested-size > + > +If there are two or more ``<memory/>`` devices then ``--alias`` shall be used > +to tell virsh which memory device should be updated. > + > +For running guests there is fourth size that can be found under ``<target/>``: > + > + :: > + > + <actual unit='KiB'>2097152</actual> > + > +The ``<actual/>`` reflects the actual size used by the guest. In general it > +can differ from ``<requested/>``. Reasons include guest kernel missing > +``virtio-mem`` module and thus being unable to take offered memory, or guest > +kernel being unable to free memory. Since ``<actual/>`` only reports size to > +users, the element is never parsed. It is formatted only into live XML. > + > +Since changing actual allocation requires cooperation with guest kernel, > +requests for change are not instant. Therefore, libvirt emits > +``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever actual > +allocation changed. Reviewed-by: Peter Krempa <pkrempa@xxxxxxxxxx>