Re: [PATCH v1] memory-hotplug.rst: complete admin-guide overhaul

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+ZONE_MOVABLE
+============
+
+ZONE_MOVABLE is an important mechanism for more reliable memory offlining.
+Further, having system RAM managed by ZONE_MOVABLE instead of one of the
+kernel zones can increase the number of possible transparent huge pages and
+dynamically allocated huge pages.
+

I'd move the first two paragraphs from "Zone Imbalances" here to provide
some context what is movable and what is unmovable allocation.

Makes sense.

[...]

-How to offline memory
----------------------
+Considerations

ZONE_MOVABLE Sizing Considerations ?


Ack

I'd also move the contents of "Boot Memory and ZONE_MOVABLE" here (with
some adjustments):

   By default, all the memory configured at boot time is managed by the kernel
   zones and ZONE_MOVABLE is not used.

   To enable ZONE_MOVABLE to include the memory present at boot and to
   control the ratio between movable and kernel zones there are two command
   line options: ``kernelcore=`` and ``movablecore=``. See
   Documentation/admin-guide/kernel-parameters.rst for their description.


Makes sense. I'll move it to the end of the "ZONE_MOVABLE Sizing Considerations" section.

+--------------
-You can offline a memory block by using the same sysfs interface that was used
-in memory onlining::
+We usually expect that a large portion of available system RAM will actually
+be consumed by user space, either directly or indirectly via the page cache. In
+the normal case, ZONE_MOVABLE can be used when allocating such pages just fine.
- % echo offline > /sys/devices/system/memory/memoryXXX/state
+With that in mind, it makes sense that we can have a big portion of system RAM
+managed by ZONE_MOVABLE. However, there are some things to consider when
+using ZONE_MOVABLE, especially when fine-tuning zone ratios:
-If offline succeeds, the state of the memory block is changed to be "offline".
-If it fails, some error core (like -EBUSY) will be returned by the kernel.
-Even if a memory block does not belong to ZONE_MOVABLE, you can try to offline
-it.  If it doesn't contain 'unmovable' memory, you'll get success.
+- Having a lot of offline memory blocks. Even offline memory blocks consume
+  memory for metadata and page tables in the direct map; having a lot of
+  offline memory blocks is not a typical case, though.
+
+- Memory ballooning. Some memory ballooning implementations, such as
+  the Hyper-V balloon, the XEN balloon, the vbox balloon and the VMWare

So, everyone except virtio-mem? ;-)

Well, virtio-mem does not classify as memory balloon in that sense, as it only operates on own device memory ;)

virtio-balloon and pseries CMM support balloon compaction.

I'd drop the names because if some of those will implement balloon
compaction they surely will forget to update the docs.

I can do the opposite and mention the ones that already do. Some most probably will never support it.

"Memory ballooning without balloon compaction is incompatible with ZONE_MOVABLE. Only some implementations, such as virtio-balloon and pseries CMM, fully support balloon compaction."



+  balloon with huge pages don't support balloon compaction and, thereby
+  ZONE_MOVABLE.
+
+  Further, CONFIG_BALLOON_COMPACTION might be disabled. In that case, balloon
+  inflation will only perform unmovable allocations and silently create a
+  zone imbalance, usually triggered by inflation requests from the
+  hypervisor.
+
+- Gigantic pages are unmovable, resulting in user space consuming a
+  lot of unmovable memory.
+
+- Huge pages are unmovable when an architectures does not support huge
+  page migration, resulting in a similar issue as with gigantic pages.
+
+- Page tables are unmovable. Excessive swapping, mapping extremely large
+  files or ZONE_DEVICE memory can be problematic, although only
+  really relevant in corner cases. When we manage a lot of user space memory
+  that has been swapped out or is served from a file/pmem/... we still need

                                                      ^ persistent memory

Agreed.


+  a lot of page tables to manage that memory once user space accessed that
+  memory once.
+
+- DAX: when we have a lot of ZONE_DEVICE memory added to the system as DAX
+  and we are not using an altmap to allocate the memmap from device memory
+  directly, we will have to allocate the memmap for this memory from the
+  kernel zones.

I'm not sure admin-guide reader will know when we use altmap when we don't.
Maybe

   DAX: in certain DAX configurations the memory map for the device memory will
   be allocated from the kernel zones.

Indeed, simpler and communicates the same message.

I'll also add

"KASAN can have a significant memory overhead, for example, consuming 1/8th of the total system memory size as (unmovable) tracking metadata."


Thanks Mike!

--
Thanks,

David / dhildenb






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux