Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> On 03/05/2011 06:20 PM, Minchan Kim wrote:
> > On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
> >> Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
> >> kernel may hang up, because shrink_zones() will do nothing, but
> >> all_unreclaimable() will say, that zone has reclaimable pages.
> >>
> >> do_try_to_free_pages()
> >> 	shrink_zones()
> >> 		 for_each_zone
> >> 			if (zone->all_unreclaimable)
> >> 				continue
> >> 	if !all_unreclaimable(zonelist, sc)
> >> 		return 1
> >>
> >> __alloc_pages_slowpath()
> >> retry:
> >> 	did_some_progress = do_try_to_free_pages(page)
> >> 	...
> >> 	if (!page&& did_some_progress)
> >> 		retry;
> >>
> >> Signed-off-by: Andrey Vagin<avagin@xxxxxxxxxx>
> >> ---
> >>   mm/vmscan.c | 2 ++
> >>   1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index 6771ea7..1c056f7 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist
> >> *zonelist,
> >>
> >>   	for_each_zone_zonelist_nodemask(zone, z, zonelist,
> >>   			gfp_zone(sc->gfp_mask), sc->nodemask) {
> >> + if (zone->all_unreclaimable)
> >> + continue;
> >>   		if (!populated_zone(zone))
> >>   			continue;
> >>   		if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> >
> > zone_reclaimable checks it. Isn't it enough?
> I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
> This two patches are enough.
> > Does the hang up really happen or see it by code review?
> Yes. You can reproduce it for help the attached python program. It's
> not
> very clever:)
> It make the following actions in loop:
> 1. fork
> 2. mmap
> 3. touch memory
> 4. read memory
> 5. munmmap
> 
> >> --
> >> 1.7.1
I have tested this for the latest mainline kernel using the reproducer
attached, the system just hung or deadlock after oom. The whole oom
trace is here.
http://people.redhat.com/qcai/oom.log

Did I miss anything?
import sys, time, mmap, os
from subprocess import Popen, PIPE
import random

global mem_size

def info(msg):
	pid = os.getpid()
	print >> sys.stderr, "%s: %s" % (pid, msg)
	sys.stderr.flush()



def memory_loop(cmd = "a"):
	"""
	cmd may be:
		c: check memory
		else: touch memory
	"""
	c = 0
	for j in xrange(0, mem_size):
		if cmd == "c":
			if f[j<<12] != chr(j % 255):
				info("Data corruption")
				sys.exit(1)
		else:
			f[j<<12] = chr(j % 255)

while True:
	pid = os.fork()
	if (pid != 0):
		mem_size = random.randint(0, 56 * 4096)
		f = mmap.mmap(-1, mem_size << 12, mmap.MAP_ANONYMOUS|mmap.MAP_PRIVATE)
		memory_loop()
		memory_loop("c")
		f.close()

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]