RE: [PATCH v3 00/25] Arrange hotpluggable memory as ZONE_MOVABLE.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Rafael J. Wysocki [mailto:rjw@xxxxxxx]
> Sent: Wednesday, August 07, 2013 4:49 PM
> To: Tang Chen
> Cc: Moore, Robert; Zheng, Lv; lenb@xxxxxxxxxx; tglx@xxxxxxxxxxxxx;
> mingo@xxxxxxx; hpa@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; tj@xxxxxxxxxx;
> trenn@xxxxxxx; yinghai@xxxxxxxxxx; jiang.liu@xxxxxxxxxx;
> wency@xxxxxxxxxxxxxx; laijs@xxxxxxxxxxxxxx;
> isimatu.yasuaki@xxxxxxxxxxxxxx; izumi.taku@xxxxxxxxxxxxxx;
> mgorman@xxxxxxx; minchan@xxxxxxxxxx; mina86@xxxxxxxxxx;
> gong.chen@xxxxxxxxxxxxxxx; vasilis.liaskovitis@xxxxxxxxxxxxxxxx;
> lwoodman@xxxxxxxxxx; riel@xxxxxxxxxx; jweiner@xxxxxxxxxx;
> prarit@xxxxxxxxxx; zhangyanfei@xxxxxxxxxxxxxx; yanghy@xxxxxxxxxxxxxx;
> x86@xxxxxxxxxx; linux-doc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> linux-mm@xxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v3 00/25] Arrange hotpluggable memory as ZONE_MOVABLE.
> 
> On Wednesday, August 07, 2013 06:51:51 PM Tang Chen wrote:
> > This patch-set aims to solve some problems at system boot time to
> > enhance memory hotplug functionality.
> >
> > [Background]
> >
> > The Linux kernel cannot migrate pages used by the kernel because of
> > the kernel direct mapping. Since va = pa + PAGE_OFFSET, if the
> > physical address is changed, we cannot simply update the kernel
> > pagetable. On the contrary, we have to update all the pointers
> > pointing to the virtual address, which is very difficult to do.
> >
> > In order to do memory hotplug, we should prevent the kernel to use
> > hotpluggable memory.
> >
> > In ACPI, there is a table named SRAT(System Resource Affinity Table).
> > It contains system NUMA info (CPUs, memory ranges, PXM), and also a
> > flag field indicating which memory ranges are hotpluggable.
> >
> >
> > [Problem to be solved]
> >
> > At the very early time when the system is booting, we use a bootmem
> > allocator, named memblock, to allocate memory for the kernel.
> > memblock will start to work before the kernel parse SRAT, which means
> > memblock won't know which memory is hotpluggable before SRAT is
> > parsed.
> >
> > So at this time, memblock could allocate hotpluggable memory for the
> > kernel to use permanently. For example, the kernel may allocate
> > pagetables in hotpluggable memory, which cannot be freed when the
> > system is up.
> >
> > So we have to prevent memblock allocating hotpluggable memory for the
> > kernel at the early boot time.
> >
> >
> > [Earlier solutions]
> >
> > We have tried to parse SRAT earlier, before memblock is ready. To do
> > this, we also have to do ACPI_INITRD_TABLE_OVERRIDE earlier.
> > Otherwise the override tables won't be able to effect.
> >
> > This is not that easy to do because memblock is ready before direct
> > mapping is setup. So Yinghai split the ACPI_INITRD_TABLE_OVERRIDE
> > procedure into two steps: find and copy. Please refer to the following
> > patch-set:
> >         https://lkml.org/lkml/2013/6/13/587
> >
> > To this solution, tj gave a lot of comments and the following
> > suggestions.
> >
> >
> > [Suggestion from tj]
> >
> > tj mainly gave the following suggestions:
> >
> > 1. Necessary reordering is OK, but we should not rely on
> >    reordering to achieve the goal because it makes the kernel
> >    too fragile.
> >
> > 2. Memory allocated to kernel for temporary usage is OK because
> >    it will be freed when the system is up. Doing relocation
> >    for permanent allocated hotpluggable memory will make the
> >    the kernel more robust.
> >
> > 3. Need to enhance memblock to discover and complain if any
> >    hotpluggable memory is allocated to kernel.
> >
> > After a long thinking, we choose not to do the relocation for the
> > following reasons:
> >
> > 1. It's easy to find out the allocated hotpluggable memory. But
> >    memblock will merge the adjoined ranges owned by different users
> >    and used for different purposes. It's hard to find the owners.
> >
> > 2. Different memory has different way to be relocated. I think one
> >    function for each kind of memory will make the code too messy.
> >
> > 3. Pagetable could be in hotpluggable memory. Relocating pagetable
> >    is too difficult and risky. We have to update all PUD, PMD pages.
> >    And also, ACPI_INITRD_TABLE_OVERRIDE and parsing SRAT procedures
> >    are not long after pagetable is initialized. If we relocate the
> >    pagetable not long after it was initialized, the code will be
> >    very ugly.
> >
> >
> > [Solution in this patch-set]
> >
> > In this patch-set, we still do the reordering, but in a new way.
> >
> > 1. Improve memblock with flags, so that it is able to differentiate
> >    memory regions for different usage. And also a MEMBLOCK_HOTPLUG
> >    flag to mark hotpluggable memory.
> >
> > 2. When memblock is ready (memblock_x86_fill() is called), initialize
> >    acpi_gbl_root_table_list, fulfill all the ACPI tables' phys addrs.
> >    Now, we have all the ACPI tables' phys addrs provided by firmware.
> >
> > 3. Check if there is a SRAT in initrd file used to override the one
> >    provided by firmware. If so, get its phys addr.
> >
> > 4. If no override SRAT in initrd, get the phys addr of the SRAT
> >    provided by firmware.
> >
> >    Now, we have the phys addr of the to be used SRAT, the one in
> >    initrd or the one in firmware.
> >
> > 5. Parse only the memory affinities in SRAT, find out all the
> >    hotpluggable memory regions and mark them in memblock.memory with
> >    MEMBLOCK_HOTPLUG flag.
> >
> > 6. The kernel goes through the current path. Any other related parts,
> >    such as ACPI_INITRD_TABLE_OVERRIDE path, the current parsing ACPI
> >    tables pathes, global variable numa_meminfo, and so on, are not
> >    modified. They work as before.
> >
> > 7. Make memblock default allocator skip hotpluggable memory.
> >
> > 8. Introduce movablenode boot option to allow users to enable
> >    and disable this functionality.
> >
> >
> > In summary, in order to get hotpluggable memory info as early as
> > possible, this patch-set only parse memory affinities in SRAT one more
> > time right after memblock is ready, and leave all the other pathes
> > untouched. With the hotpluggable memory info, we can arrange
> > hotpluggable memory in ZONE_MOVABLE to prevent the kernel to use it.
> >
> > change log v2 RESEND -> v3:
> > 1. As Rafael and Lv Zheng suggested, split acpi global root table list
> >    initialization procedure into two steps: install and override. And
> >    do the "install" step earlier.
> 
> This looks a bit more manageable than before, but please do one more
> thing:
> Please split all of the ACPICA changes out into separate patches and put
> those patched in front of everything else.
> 
> The reason is we may need to merge them through upstream ACPICA as the
> first step (if they are accepted by the ACPICA maintainers).
> 


Yes, we (ACPICA) would like to see them all together in one place so that we can review.
Thanks,
Bob




> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
��.n������g����a����&ޖ)���)��h���&������梷�����Ǟ�m������)������^�����������v���O��zf������





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]