Re: [PATCH part5 0/7] Arrange hotpluggable memory as ZONE_MOVABLE.

Tang Chen <tangchen@xxxxxxxxxxxxxx> · Tue, 13 Aug 2013 17:56:46 +0800

Hi tj,

When doing the "near kernel memory allocation", I have something
about memblock that I need you to comfirm.

1. First of all, memblock is platform independent. Different platforms
   have different ways to store kernel image address. So I don't think
   we can obtain the kernel image address on memblock side, right ?

   If so, then we need to pass kernel image address to memblock. But...

2. There are several places calling memblock_find_in_range_node() to
   allocate memory before SRAT parsed.

   early_reserve_e820_mpc_new()
   reserve_real_mode()
   init_mem_mapping()
   setup_log_buf()
   relocate_initrd()
   acpi_initrd_override()
   reserve_crashkernel()

   Maybe more, I didn't find out.

   And in the future, maybe someone will add code to allocate memory
   before SRAT parsed. So I don't think we should pass kernel image
   addr to them one by one. It will modify a lot of things.

So I think we need a generic way to tell memblock to allocate memory
from the kernel image end address to higher memory.

My idea is:

1. Introduce a memblock.current_limit_low to limit the lowest address
   that memblock can use.

2. Make memblock be able to allocate memory from low to high.

3. Get kernel image address on x86, and set memblock.current_limit_low
   to it before SRAT is parsed. Then we achieve the goal.

4. Reset it to 0, and make memblock allocate memory form high to low.

How do you think of this, or do you have any better idea ?

Thanks for your patient and help. :)

On 08/13/2013 02:14 PM, Tang Chen wrote:
On 08/13/2013 12:46 AM, Tejun Heo wrote:
......

* Adding an option to tell the kernel to try to stay away from
hotpluggable nodes is fine. I have no problem with that at all.

* The patchsets upto this point have been somehow trying to reorder
operations shomehow such that *no* memory allocation happens before
memblock is populated with hotplug information.

* However, we already *know* that the memory the kernel image is
occupying won't be removeable. It's highly likely that the amount
of memory allocation before NUMA / hotplug information is fully
populated is pretty small. Also, it's highly likely that small
amount of memory right after the kernel image is contained in the
same NUMA node, so if we allocate memory close to the kernel image,
it's likely that we don't contaminate hotpluggable node. We're
talking about few megs at most right after the kernel image. I
can't see how that would make any noticeable difference.

* Once hotplug information is available, allocation can happen as
usual and the kernel can report the nodes which are actually
hotpluggable - marked as hotpluggable by the firmware&& didn't get
contaminated during early alloc&& didn't get overflow allocations
afterwards. Note that we need such mechanism no matter what as the
kernel image can be loaded into hotpluggable nodes and reporting
that to userland is the only thing the kernel can do for cases like
that short of denying memory unplug on such nodes.

Hi tj, hpa, luck, yinghai,

So if all of you agree on the idea above from tj, I think
we can do it in this way. Will update the patches to allocate
memory near kernel image before SRAT is parsed.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html