Re: [RFC PATCH]: ACPI: Automatically online hot-added memory

Prarit Bhargava <prarit@xxxxxxxxxx> · Wed, 17 Mar 2010 11:24:47 -0400

Thomas Renninger wrote:
On Thursday 11 March 2010 01:55:15 ykzhao wrote:

On Wed, 2010-03-10 at 21:28 +0800, Prarit Bhargava wrote:

Why do we need to see whether the memory is onlined before bringing cpu
to online state? It seems that there is no dependency between cpu online
and memory online.

Yakui,

Thanks for the explanation.

Here's a deeper look into the issue.  New Intel processors have an 
on-die memory controller and this means that as the socket comes and 
goes, so does the memory "behind" the socket.

Yes. The nehalem processor has the integrated memory controller. But it
is not required that the hot-added memory should be onlined before
bringing up CPU.
    I do the following memory-hotplug test on one Machine.
    a. Before hot plugging memory, four CPUs socket are installed and
all the logical CPU are brought up. (Only one node has the memory)
    b. The memory is hot-plugged and then the memory is onlined so that
it can be accessed by the system.

In the above testing case the CPU is brought up before onlining the
hot-added memory. And the test shows that it can work well.

ie) with new processors it is possible that an entire node which 
consists of memory and cpus comes and goes with the socket enable and 
disable.

The cpu bringup code does local node allocations for the cpu.  If the 
memory connected to the node (which is "behind" the socket) isn't 
online, then these allocations fail, and then the cpu bringup fails.

If the CPU can't allocate the memory from its own node, it can turn to
other node and see whether the memory can be allocated. And this depends
on the NUMA allocation policy.

Yes and this is broken and needs fixing.
Yakui, I expect you miss this patch and wrongly online the cpus to existing
nodes, therefore you do not run into "out of memory" conditions:
0271f91003d3703675be13b8865618359a6caa1f

FWIW, I'm working on a 2.6.32 based tree, but I have that patch in (as 
well as several others).  I'm also running the latest upstream (tip as 
of this morning).  The issues I see in my 2.6.32 based tree are the same 
AFAICT that I see upstream: a cpu comes online and attempts to make a 
per_node allocation which fails, so the cpu bringup fails.

I know for sure that slab is broken.

Yes, but I believe Andi Kleen has added some patches that resolve (at 
least some of) the issues.  I've been using slab (and occasionally 
testing slub).

slub behaves different, but I am not sure whether this is due to wrong CPU
hotadd code (processor_core.c is also broken and you get wrong C-state info
from BIOS tables on hotadded CPUs)

Prarit: Can you retest with slub and processor.max_cstate=1, this could/should
work.

Two tests:

1.  WITHOUT my auto online patch, the cpus fail to come into service 
because of a per_node allocation failure.
2.  WITH my auto online patch, the cpus come into service

... I have NOT done any sort of testing to see if the cpus are really 
live ;)

AFAIK vmware injects memory in the same way into clients, so you may have
different behavior of virtualized Linux clients.

Are you referring the vmware ballooning driver (or whatever they call 
it).  IIRC (and I'm not saying I do ;) ), vmware adds memory and 
automatically onlines it in a guest.  I'm not sure how that's done -- it 
could be via udev.

I'll see if anyone here knows.

One question: You also want to automatically add the CPUs, once a CPU hotplug
event got fired, right?

Yes,  That's correct.

The fact that the memory hotplug driver adds the memory immediately once notified,
does not ensure that the HW/BIOS fires this event first.

Is that right?  FWIW I always see this sequence of events:

ACPI memory added
ACPI cpus added

I never see them out-of-order.  OTOH, I'm only testing on Intel's latest 
platform so maybe there are some older systems that don't do this in 
that order. 

Theoretically you need a logic to not add CPUs to memoryless nodes, poll/wait
until memory got added, etc.

Theoretically yes -- but are there any systems that generate cpu add 
events before memory add events?

Thanks for the input Thomas :)

P.
   Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html