HOLES_IN_ZONE...

David Miller <davem@xxxxxxxxxxxxx> · Wed, 04 Feb 2009 22:26:51 -0800 (PST)

So I've been fighting mysterious crashes on my main sparc64 devel
machine.  What's happening is that the assertion in
mm/page_alloc.c:move_freepages() is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

So I did a lot (and I do mean _A LOT_) of digging.  And it seems that
unless you set HOLES_IN_ZONE you have to make sure that all of the
memmap regions of free space in a zone begin and end on an HPAGE_SIZE
boundary (the requirement used to be that it had to be MAX_ORDER
sized).

Well, this assumption enterred the tree back in 2005 (!!!) from
the following commit in the history-2.6 tree:

commit 69fba2dd0335abec0b0de9ac53d5bbb67c31fc60
Author: Kamezawa Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
Date:   Fri Jan 7 22:01:35 2005 -0800

    [PATCH] no buddy bitmap patch revisit: for mm/page_alloc.c

At the time only IA64 had HOLES_IN_ZONE added, this happens in the
commit right after the above one.

So in theory Sparc64 has been broken since that commit and subject to
potential memory corruption and other unnice things.

I also noticed that when S390 got virtual memmap support, it acquired
the HOLES_IN_ZONE setting as well, in this commit:

commit f4eb07c17df2e6cf9bd58bfcd9cc9e05e9489d07
Author: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
Date:   Fri Dec 8 15:56:07 2006 +0100

    [S390] Virtual memmap for s390.

This is confusing.  Is HOLES_IN_ZONE only required when virtual mmap
is being used?  If so, why is that?  This is a very poorly documented
flag, and I'm saying this after pouring over every commit referencing
it.

Later this HOLES_IN_ZONE requirement was removed on s390 by commit:

commit 9f4b0ba81f158df459fa2cfc98ab1475c090f29c
Author: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
Date:   Sat Jan 26 14:11:02 2008 +0100

    [S390] Get rid of HOLES_IN_ZONE requirement.

Anyways...

The point of this email is, do I really need to set this thing on
sparc64?  I've never seen this check triggered before, it only seems
to have started triggering in 2.6.27 or so but I obviously can't find
anything that would influence something of this nature.

It takes a lot of stressing to get that specific chunk of pages to
attempt to be freed up in a group like that :-/

As a suggestion, it would have been a lot more pleasant if the code
validated this requirement (in the !HOLES_IN_ZONE case) at boot time
instead of after 2 hours of stress testing :-(
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html