Re: problems with memory hotplug/remove on 3.0.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18 Oct 11 15:27, Larry Bassel wrote:
> We have encountered two problems with memory hotplug/hotremove
> in 3.0.1 -- this is a port of memory hotplug to ARM with a few
> small changes noted below.
> 
> Neither of these occurred on a similar 2.6.38-based port
> we did to the same hardware.
> 
> The memory is essentially 2 512M memory banks, the lower
> is always on, the upper is the one we are powering on
> and off. ARCH_POPULATES_NODE_MAP was ported to ARM
> and a small change was made to ensure that
> the movable zone could be placed exactly where desired
> (as movablecore= does not and must be specified on
> the command line -- we don't know where the movable
> zone must be until the kernel starts coming up).
> Also the upper 512M is forced to be highmem as
> the movable zone must come from the highest physical
> memory zone (of course highmem may be larger than
> 512M, just not smaller).
> 
> 1. If highmem is set to start at exactly 512M, then
> all of highmem is used up when forming the movable
> zone. This seems to confuse the memory management
> subsystem (page reclaim?) because although the memory
> hotremove of the upper 512M succeeds, running a command
> that takes a pagefault after hotremove causes
> the system to hang:
> 
> try_to_free_pages
> __alloc_pages_nodemask
> do_wp_page
> handle_pte_fault
> handle_mm_fault
> do_page_fault
> 
> try_to_free_pages() is called repeatedly (forever), making no
> apparent progress. After some experimentation, I
> discovered that making the highmem zone at least 5M
> larger than the 512M movable zone appears to make the
> problem disappear.
> 
> I can (if I don't run anything that provokes the
> above bug) hotplug the 512M back in, and then this
> problem does not occur.
> 
> I've seen some discussion about very small zones causing
> problems. Is what we are seeing a known problem?
> Is there a known fix (or at least a patch we could try)?
> 
> 2. Assuming the workaround we have for #1 is present,
> we see memory hotremove occasionally fail. This seems
> to (after a few seconds) cause init's state to become
> corrupted, provoking a panic -- sometimes (but not always)
> init's PC is 0. Sometimes additional (not always the
> same) processes also unexpectedly exit after the
> memory hotremove attempt.

Sorry to reply to my own post, but the second problem
was due to an error on our part -- I still believe
the first one is real and would appreciate help with it.

Thanks.

Larry

-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]