Re: [PATCH v4 00/18] virtio-mem: Expose device memory through multiple memslots

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 26, 2023 at 08:57:20PM +0200, David Hildenbrand wrote:
> Quoting from patch #16:
> 
>     Having large virtio-mem devices that only expose little memory to a VM
>     is currently a problem: we map the whole sparse memory region into the
>     guest using a single memslot, resulting in one gigantic memslot in KVM.
>     KVM allocates metadata for the whole memslot, which can result in quite
>     some memory waste.
> 
>     Assuming we have a 1 TiB virtio-mem device and only expose little (e.g.,
>     1 GiB) memory, we would create a single 1 TiB memslot and KVM has to
>     allocate metadata for that 1 TiB memslot: on x86, this implies allocating
>     a significant amount of memory for metadata:
> 
>     (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB
>         -> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %)
> 
>         With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets
>         allocated lazily when required for nested VMs
>     (2) gfn_track: 2 bytes per 4 KiB
>         -> For 1 TiB: 536870912 = ~512 MiB (0.05 %)
>     (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB
>         -> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %)
>     (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page
>         -> For 1 TiB: 536870912 = 64 MiB (0.006 %)
> 
>     So we primarily care about (1) and (2). The bad thing is, that the
>     memory consumption doubles once SMM is enabled, because we create the
>     memslot once for !SMM and once for SMM.
> 
>     Having a 1 TiB memslot without the TDP MMU consumes around:
>     * With SMM: 5 GiB
>     * Without SMM: 2.5 GiB
>     Having a 1 TiB memslot with the TDP MMU consumes around:
>     * With SMM: 1 GiB
>     * Without SMM: 512 MiB
> 
>     ... and that's really something we want to optimize, to be able to just
>     start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device
>     that can grow very large (e.g., 1 TiB).
> 
>     Consequently, using multiple memslots and only mapping the memslots we
>     really need can significantly reduce memory waste and speed up
>     memslot-related operations. Let's expose the sparse RAM memory region using
>     multiple memslots, mapping only the memslots we currently need into our
>     device memory region container.
> 
> The hyper-v balloon driver has similar demands [1].
> 
> For virtio-mem, this has to be turned manually on ("dynamic-memslots=on"),
> due to the interaction with vhost (below).
> 
> If we have less than 509 memslots available, we always default to a single
> memslot. Otherwise, we automatically decide how many memslots to use
> based on a simple heuristic (see patch #12), and try not to use more than
> 256 memslots across all memory devices: our historical DIMM limit.
> 
> As soon as any memory devices automatically decided on using more than
> one memslot, vhost devices that support less than 509 memslots (e.g.,
> currently most vhost-user devices like with virtiofsd) can no longer be
> plugged as a precaution.
> 
> Quoting from patch #12:
> 
>     Plugging vhost devices with less than 509 memslots available while we
>     have memory devices plugged that consume multiple memslots due to
>     automatic decisions can be problematic. Most configurations might just fail
>     due to "limit < used + reserved", however, it can also happen that these
>     memory devices would suddenly consume memslots that would actually be
>     required by other memslot consumers (boot, PCI BARs) later. Note that this
>     has always been sketchy with vhost devices that support only a small number
>     of memslots; but we don't want to make it any worse.So let's keep it simple
>     and simply reject plugging such vhost devices in such a configuration.
> 
>     Eventually, all vhost devices that want to be fully compatible with such
>     memory devices should support a decent number of memslots (>= 509).
> 
> 
> The recommendation is to plug such vhost devices before the virtio-mem
> decides, or to not set "dynamic-memslots=on". As soon as these devices
> support a reasonable number of memslots (>= 509), this will start working
> automatically.
> 
> I run some tests on x86_64, now also including vfio and migration tests.
> Seems to work as expected, even when multiple memslots are used.
> 
> 
> Patch #1 -- #3 are from [2] that were not picked up yet.
> 
> Patch #4 -- #12 add handling of multiple memslots to memory devices
> 
> Patch #13 -- #16 add "dynamic-memslots=on" support to virtio-mem
> 
> Patch #15 -- #16 make sure that virtio-mem memslots can be enabled/disable
>              atomically


Reviewed-by: Michael S. Tsirkin <mst@xxxxxxxxxx>

pls feel free to merge.


> v3 -> v4:
> * "virtio-mem: Pass non-const VirtIOMEM via virtio_mem_range_cb"
>  -> Cleanup patch added
> * "virtio-mem: Update state to match bitmap as soon as it's been migrated"
>  -> Cleanup patch added
> * "virtio-mem: Expose device memory dynamically via multiple memslots if
>    enabled"
>  -> Parameter now called "dynamic-memslots"
>  -> With "dynamic-memslots=off", don't use a memory region container and
>     just use the old handling: always map the RAM memory region [thus the
>     new parameter name]
>  -> Require "unplugged-inaccessible=on" (default) with
>     "dynamic-memslots=on" for simplicity
>  -> Take care of proper migration handling
>  -> Remove accidential additional busy check in virtio_mem_unplug_all()
>  -> Minor comment cleanups
>  -> Dropped RB because of changes
> 
> v2 -> v3:
> * "kvm: Return number of free memslots"
>  -> Return 0 in stub
> * "kvm: Add stub for kvm_get_max_memslots()"
>  -> Return 0 in stub
> * Adjust other patches to check for kvm_enabled() before calling
>   kvm_get_free_memslots()/kvm_get_max_memslots()
> * Add RBs
> 
> v1 -> v2:
> * Include patches from [1]
> * A lot of code simplification and reorganization, too many to spell out
> * don't add a general soft-limit on memslots, to avoid warning in sane
>   setups
> * Simplify handling of vhost devices with a small number of memslots:
>   simply fail plugging them
> * "virtio-mem: Expose device memory via multiple memslots if enabled"
>  -> Fix one "is this the last memslot" check
> * Much more testing
> 
> 
> [1] https://lkml.kernel.org/r/cover.1689786474.git.maciej.szmigiero@xxxxxxxxxx
> [2] https://lkml.kernel.org/r/20230523185915.540373-1-david@xxxxxxxxxx
> 
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Igor Mammedov <imammedo@xxxxxxxxxx>
> Cc: Xiao Guangrong <xiaoguangrong.eric@xxxxxxxxx>
> Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx>
> Cc: Peter Xu <peterx@xxxxxxxxxx>
> Cc: "Philippe Mathieu-Daudé" <philmd@xxxxxxxxxx>
> Cc: Eduardo Habkost <eduardo@xxxxxxxxxxx>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@xxxxxxxxx>
> Cc: Yanan Wang <wangyanan55@xxxxxxxxxx>
> Cc: Michal Privoznik <mprivozn@xxxxxxxxxx>
> Cc: Daniel P. Berrangé <berrange@xxxxxxxxxx>
> Cc: Gavin Shan <gshan@xxxxxxxxxx>
> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Cc: Stefan Hajnoczi <stefanha@xxxxxxxxxx>
> Cc: Maciej S. Szmigiero <mail@xxxxxxxxxxxxxxxxxxxxx>
> Cc: kvm@xxxxxxxxxxxxxxx
> 
> David Hildenbrand (18):
>   vhost: Rework memslot filtering and fix "used_memslot" tracking
>   vhost: Remove vhost_backend_can_merge() callback
>   softmmu/physmem: Fixup qemu_ram_block_from_host() documentation
>   kvm: Return number of free memslots
>   vhost: Return number of free memslots
>   memory-device: Support memory devices with multiple memslots
>   stubs: Rename qmp_memory_device.c to memory_device.c
>   memory-device: Track required and actually used memslots in
>     DeviceMemoryState
>   memory-device,vhost: Support memory devices that dynamically consume
>     memslots
>   kvm: Add stub for kvm_get_max_memslots()
>   vhost: Add vhost_get_max_memslots()
>   memory-device,vhost: Support automatic decision on the number of
>     memslots
>   memory: Clarify mapping requirements for RamDiscardManager
>   virtio-mem: Pass non-const VirtIOMEM via virtio_mem_range_cb
>   virtio-mem: Update state to match bitmap as soon as it's been migrated
>   virtio-mem: Expose device memory dynamically via multiple memslots if
>     enabled
>   memory,vhost: Allow for marking memory device memory regions
>     unmergeable
>   virtio-mem: Mark memslot alias memory regions unmergeable
> 
>  MAINTAINERS                                   |   1 +
>  accel/kvm/kvm-all.c                           |  35 +-
>  accel/stubs/kvm-stub.c                        |   9 +-
>  hw/mem/memory-device.c                        | 196 ++++++++++-
>  hw/virtio/vhost-stub.c                        |   9 +-
>  hw/virtio/vhost-user.c                        |  21 +-
>  hw/virtio/vhost-vdpa.c                        |   1 -
>  hw/virtio/vhost.c                             | 103 +++++-
>  hw/virtio/virtio-mem-pci.c                    |  21 ++
>  hw/virtio/virtio-mem.c                        | 330 +++++++++++++++++-
>  include/exec/cpu-common.h                     |  15 +
>  include/exec/memory.h                         |  27 +-
>  include/hw/boards.h                           |  14 +-
>  include/hw/mem/memory-device.h                |  57 +++
>  include/hw/virtio/vhost-backend.h             |   9 +-
>  include/hw/virtio/vhost.h                     |   3 +-
>  include/hw/virtio/virtio-mem.h                |  32 +-
>  include/sysemu/kvm.h                          |   4 +-
>  include/sysemu/kvm_int.h                      |   1 +
>  softmmu/memory.c                              |  35 +-
>  softmmu/physmem.c                             |  17 -
>  .../{qmp_memory_device.c => memory_device.c}  |  10 +
>  stubs/meson.build                             |   2 +-
>  23 files changed, 839 insertions(+), 113 deletions(-)
>  rename stubs/{qmp_memory_device.c => memory_device.c} (56%)
> 
> -- 
> 2.41.0




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux