Re: [RFC PATCH]: ACPI: Automatically online hot-added memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday 11 March 2010 01:55:15 ykzhao wrote:
> On Wed, 2010-03-10 at 21:28 +0800, Prarit Bhargava wrote:
> > >
> > > Why do we need to see whether the memory is onlined before bringing cpu
> > > to online state? It seems that there is no dependency between cpu online
> > > and memory online.
> > >
> > >   
> > 
> > Yakui,
> > 
> 
> Thanks for the explanation.
> 
> > Here's a deeper look into the issue.  New Intel processors have an 
> > on-die memory controller and this means that as the socket comes and 
> > goes, so does the memory "behind" the socket.
> 
> Yes. The nehalem processor has the integrated memory controller. But it
> is not required that the hot-added memory should be onlined before
> bringing up CPU.
>     I do the following memory-hotplug test on one Machine.
>     a. Before hot plugging memory, four CPUs socket are installed and
> all the logical CPU are brought up. (Only one node has the memory)
>     b. The memory is hot-plugged and then the memory is onlined so that
> it can be accessed by the system.
> 
> In the above testing case the CPU is brought up before onlining the
> hot-added memory. And the test shows that it can work well.
> 
> > 
> > ie) with new processors it is possible that an entire node which 
> > consists of memory and cpus comes and goes with the socket enable and 
> > disable.
> > 
> > The cpu bringup code does local node allocations for the cpu.  If the 
> > memory connected to the node (which is "behind" the socket) isn't 
> > online, then these allocations fail, and then the cpu bringup fails.
> 
> If the CPU can't allocate the memory from its own node, it can turn to
> other node and see whether the memory can be allocated. And this depends
> on the NUMA allocation policy.
Yes and this is broken and needs fixing.
Yakui, I expect you miss this patch and wrongly online the cpus to existing
nodes, therefore you do not run into "out of memory" conditions:
0271f91003d3703675be13b8865618359a6caa1f

I know for sure that slab is broken.
slub behaves different, but I am not sure whether this is due to wrong CPU
hotadd code (processor_core.c is also broken and you get wrong C-state info
from BIOS tables on hotadded CPUs)

Prarit: Can you retest with slub and processor.max_cstate=1, this could/should
work.

AFAIK vmware injects memory in the same way into clients, so you may have
different behavior of virtualized Linux clients.

This is a work around for current memory management not being able
to allocate from foreign nodes. That should not mean that I generally
vote against this to get added. If it works reliably why not add a work
around until the more complicated stuff works.

One question: You also want to automatically add the CPUs, once a CPU hotplug
event got fired, right?
The fact that the memory hotplug driver adds the memory immediately once notified,
does not ensure that the HW/BIOS fires this event first.
Theoretically you need a logic to not add CPUs to memoryless nodes, poll/wait
until memory got added, etc.

   Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux