Re: Question: Using online_pages/offline_pages() with granularity < mem section size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 2, 2018 at 7:23 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:
> Hi,
>
> in the context of virtualization, I am experimenting right now with an
> approach to plug/unplug memory using a paravirtualized interface(not
> ACPI). And I stumbled over certain things, looking at the memory hot/un
> plug code.
>
> The big picture:
>
> A paravirtualized device provides a physical memory region to the guest.
> We could have multiple such devices. Each device is assigned to a NUMA
> node. We want to control how much memory in such a region the guest is
> allowed to use. We can dynamically add/remove memory to NUMA nodes this
> way and make sure a guest cannot make use of more memory than requested.
>
> Especially: We decide in the kernel which memory block to online/offline.
>
>
> The basic mechanism:
>
> The hypervisor provides a physical memory region to the guest. This
> memory region can be used by the guest to plug/unplug memory. The
> hypervisor asks for a certain amount of used memory and the guest should
> try to reach that goal, by plugging/unplugging memory. Whenever the
> guest wants to plug/unplug a block, it has to communicate that to the
> hypervisor.
>
> The hypervisor can grant/deny requests to plug/unplug a block of memory.
> Especially, the guest must not take more memory than requested. Trying
> to read unplugged memory succeeds (e.g. for kdump), writing to that
> memory is prohibited.
>
> Memory blocks can be of any granularity, but 1-4MB looks like a sane
> amount to not fragment memory too much. If the guest can't find free
> memory blocks, no unplug is possible.
>
>
> In the guest, I add_memory() new memory blocks to the NORMAL zone. The
> NORMAL zone makes it harder to remove memory but we don't run into any
> problems (e.g. too little NORMAL memory e.g. for page tables). Now,
> these chunks are fairly big (>= 128MB) and there seems to be no way to
> plug/unplug smaller chunks to Linux using official interfaces ("memory
> segments"). Trying to remove >=128MB of NORMAL memory will usually not
> succeed. So I thought about manually removing parts of a memory section.
>
> Yes, this sounds similar to a balloon, but it is different: I have to
> offline memory in a certain memory range, not just any memory in the
> system. So I cannot simply use kmalloc() - there is no allocator that
> guarantees that.
>
> So instead I want ahead and thought about simply manually
> offlining/onlining parts of a memory segment - especially "page blocks".
> I do my own bookkeeping about which parts of a memory segment are
> online/offline and use that information for finding blocks to
> plug/unplug. The offline_pages() interface made me assume that this
> should work with blocks in the size of pageblock_nr_pages.
>
>
> I stumbled over the following two problems:
>
> 1. __offline_isolated_pages() doesn't care about page blocks, it simply
> calls offline_mem_sections(), which marks the whole section as offline,
> although it has to remain online until all pages in that section were
> offlined. Now this can be handled by moving the offline_mem_sections()
> logic further outside to the caller of offline_pages().
>
> 2. While offlining 2MB blocks (page block size), I discovered that more
> memory was marked as reserved. Especially, a page block contains pages
> with an order 10 (4MB), which implies that two page blocks are "bound
> together". This is also done in __offline_isolated_pages(). Offlining
> 2MB will result in 4MB being marked as reserved.
>
> Now, when I switch to 4MB, my manual online_pages/offline_pages seems so
> far to work fine.
>
> So my questions are:
>
> Can I assume that online_pages/offline_pages() works with "MAX_ORDER -
> 1" sizes reliably? Should the checks in these functions be updated? page
> blocks does not seem to be the real deal.
>
> Any better approach to allocate memory in a specific memory range
> (without fake numa nodes)? So I could avoid using
> online_pages/offline_pages and instead do it similar to a balloon
> driver? (mark the page as reserved myself)

Not sure this answers your questions, but I did play with sub-section
memory hotplug last year in this patch set, but it fell to the bottom
of my queue. At least at the time it seemed possible to remove the
section alignment constraints of memory hotplug.

https://lists.01.org/pipermail/linux-nvdimm/2017-March/009167.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux