Re: [PATCH RESEND part2 v2 1/8] x86: get pg_data_t's memory from other node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 20, 2014 at 03:29:41PM +0800, Tang Chen wrote:
> Hi Mel,
> 
> On 01/17/2014 01:11 AM, Mel Gorman wrote:
> >On Tue, Dec 03, 2013 at 10:22:00AM +0800, Zhang Yanfei wrote:
> >>From: Yasuaki Ishimatsu<isimatu.yasuaki@xxxxxxxxxxxxxx>
> >>
> >>If system can create movable node which all memory of the node is allocated
> >>as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's
> >>pg_data_t. So, invoke memblock_alloc_nid(...MAX_NUMNODES) again to retry when
> >>the first allocation fails. Otherwise, the system could failed to boot.
> >>(We don't use memblock_alloc_try_nid() to retry because in this function,
> >>if the allocation fails, it will panic the system.)
> >>
> >
> >This implies that it is possible to ahve a configuration with a big ratio
> >difference between Normal:Movable memory. In such configurations there
> >would be a risk that the system will reclaim heavily or go OOM because
> >the kernrel cannot allocate memory due to a relatively small Normal
> >zone. What protects against that? Is the user ever warned if the ratio
> >between Normal:Movable very high?
> 
> For now, there is no way protecting against this. But on a modern
> server, it won't be
> that easy running out of memory when booting, I think.
> 


Booting is a basic functional requirement and I'm more concerned about the
behaviour of the kernel when the machine is running.  If the kernel trashes
heavily or goes OOM when a workload starts then the fact the machine booted
is not much comfort.

> The current implementation will set any node the kernel resides in
> as unhotpluggable,
> which means normal zone here. And for nowadays server, especially
> memory hotplug server,
> each node would have at least 16GB memory, which is enough for the
> kernel to boot.
> 

Again, booting is fine but least say it's an 8-node machine then that
implies the Normal:Movable ratio will be 1:8. All page table pages, inode,
dentries etc will have to fit in that 1/8th of memory with all the associated
costs including remote access penalties.  In extreme cases it may not be
possible to use all of memory because the management structures cannot be
allocated. Users may want the option of adjusting what this ratio is so
they can unplug some memory while not completely sacrificing performance.

Minimally, the kernel should print a big fat warning if the ratio is equal
or more than 1:3 Normal:Movable. That ratio selection is arbitrary. I do not
recall ever seeing any major Normal:Highmem bugs on 4G 32-bit machines so it
is a conservative choice. The last Normal:Highmem bug I remember was related
to a 16G 32-bit machine (https://bugzilla.kernel.org/show_bug.cgi?id=42578)
a 1:15 ratio feels very optimistic for a very large machine.

> We can add a patch to make it return to the original path if we run
> out of memory,
> which means turn off the functionality and warn users in log.
> 
> How do you think ?
> 

I think that will allow the machine to boot but that there still will be a
large number of bugs filed with these machines due to high Normal:Movable
ratios. The shape of the bug reports will be similar to the Normal:Highmem
ratio bugs that existed years ago.

> > The movable_node boot parameter still
> >turns the feature on and off, there appears to be no way of controlling
> >the ratio of memory other than booting with the minimum amount of memory
> >and manually hot-adding the sections to set the appropriate ratio.
> 
> For now, yes. We expect firmware and hardware to give the basic
> ratio (how much memory
> is hotpluggable), and the user decides how to arrange the memory
> (decide the size of
> normal zone and movable zone).
> 

There seems to be big gaps in the configuration options here. The user
can either ask it to be automatically assigned and have no control of
the ratio or manually hot-add the memory which is a relatively heavy
administrative burden.

I think they should be warned if the ratio is high and have an option of
specifying a ratio manually even if that means that additional nodes
will not be hot-removable.

This is all still a kludge around the fact that node memory hot-remove
did not try and cope with full migration by breaking some of the 1:1
virt:phys mapping assumptions when hot-remove was enabled.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]