On Wed, Jul 31, 2013 at 11:13:07PM -0700, Lisa Du wrote: > >On Mon, Jul 22, 2013 at 09:58:17PM -0700, Lisa Du wrote: > >> Dear Sir: > >> Currently I met a possible deadloop in direct reclaim. After run plenty of > >the application, system run into a status that system memory is very > >fragmentized. Like only order-0 and order-1 memory left. > >> Then one process required a order-2 buffer but it enter an endless direct > >reclaim. From my trace log, I can see this loop already over 200,000 times. > >Kswapd was first wake up and then go back to sleep as it cannot rebalance > >this order's memory. But zone->all_unreclaimable remains 1. > >> Though direct_reclaim every time returns no pages, but as > >zone->all_unreclaimable = 1, so it loop again and again. Even when > >zone->pages_scanned also becomes very large. It will block the process for > >long time, until some watchdog thread detect this and kill this process. > >Though it's in __alloc_pages_slowpath, but it's too slow right? Maybe cost > >over 50 seconds or even more. > >> I think it's not as expected right? Can we also add below check in the > >function all_unreclaimable() to terminate this loop? > >> > >> @@ -2355,6 +2355,8 @@ static bool all_unreclaimable(struct zonelist > >*zonelist, > >> continue; > >> if (!zone->all_unreclaimable) > >> return false; > >> + if (sc->nr_reclaimed == 0 && !zone_reclaimable(zone)) > >> + return true; > >> } > >> BTW: I'm using kernel3.4, I also try to search in the kernel3.9, > >didn't see a possible fix for such issue. Or is anyone also met such issue > >before? Any comment will be welcomed, looking forward to your reply! > >> > >> Thanks! > > > >I'd like to ask somethigs. > > > >1. Do you have enabled swap? > I set CONFIG_SWAP=y, but I didn't really have a swap partition, that means my swap buffer size is 0; > >2. Do you enable CONFIG_COMPACTION? > No, I didn't enable; > >3. Could we get your zoneinfo via cat /proc/zoneinfo? > I dump some info from ramdump, please review: Thanks for the information. You said order-2 allocation was failed so I will assume preferred zone is normal zone, not high zone because high order allocation in kernel side isn't from high zone. > crash> kmem -z > NODE: 0 ZONE: 0 ADDR: c08460c0 NAME: "Normal" > SIZE: 192512 PRESENT: 182304 MIN/LOW/HIGH: 853/1066/1279 712M normal memory. > VM_STAT: > NR_FREE_PAGES: 16092 There are plenty of free pages over high watermark but there are heavy fragmentation as I see below information. So, kswapd doesn't scan this zone loop iteration is done with order-2. I mean kswapd will scan this zone with order-0 if first iteration is done by this order = sc.order = 0; goto loop_again; But this time, zone_watermark_ok_safe with testorder = 0 on normal zone is always true so that scanning of zone will be skipped. It means kswapd never set zone->unreclaimable to 1. > NR_INACTIVE_ANON: 17 > NR_ACTIVE_ANON: 55091 > NR_INACTIVE_FILE: 17 > NR_ACTIVE_FILE: 17 > NR_UNEVICTABLE: 0 > NR_MLOCK: 0 > NR_ANON_PAGES: 55077 There are about 200M anon pages and few file pages. You don't have swap so that reclaimer couldn't go far. > NR_FILE_MAPPED: 42 > NR_FILE_PAGES: 69 > NR_FILE_DIRTY: 0 > NR_WRITEBACK: 0 > NR_SLAB_RECLAIMABLE: 1226 > NR_SLAB_UNRECLAIMABLE: 9373 > NR_PAGETABLE: 2776 > NR_KERNEL_STACK: 798 > NR_UNSTABLE_NFS: 0 > NR_BOUNCE: 0 > NR_VMSCAN_WRITE: 91 > NR_VMSCAN_IMMEDIATE: 115381 > NR_WRITEBACK_TEMP: 0 > NR_ISOLATED_ANON: 0 > NR_ISOLATED_FILE: 0 > NR_SHMEM: 31 > NR_DIRTIED: 15256 > NR_WRITTEN: 11981 > NR_ANON_TRANSPARENT_HUGEPAGES: 0 > > NODE: 0 ZONE: 1 ADDR: c08464c0 NAME: "HighMem" > SIZE: 69632 PRESENT: 69088 MIN/LOW/HIGH: 67/147/228 > VM_STAT: > NR_FREE_PAGES: 161 Reclaimer should reclaim this zone. > NR_INACTIVE_ANON: 104 > NR_ACTIVE_ANON: 46114 > NR_INACTIVE_FILE: 9722 > NR_ACTIVE_FILE: 12263 It seems there are lots of room to evict file pages. > NR_UNEVICTABLE: 168 > NR_MLOCK: 0 > NR_ANON_PAGES: 46102 > NR_FILE_MAPPED: 12227 > NR_FILE_PAGES: 22270 > NR_FILE_DIRTY: 1 > NR_WRITEBACK: 0 > NR_SLAB_RECLAIMABLE: 0 > NR_SLAB_UNRECLAIMABLE: 0 > NR_PAGETABLE: 0 > NR_KERNEL_STACK: 0 > NR_UNSTABLE_NFS: 0 > NR_BOUNCE: 0 > NR_VMSCAN_WRITE: 0 > NR_VMSCAN_IMMEDIATE: 0 > NR_WRITEBACK_TEMP: 0 > NR_ISOLATED_ANON: 0 > NR_ISOLATED_FILE: 0 > NR_SHMEM: 117 > NR_DIRTIED: 7364 > NR_WRITTEN: 6989 > NR_ANON_TRANSPARENT_HUGEPAGES: 0 > > ZONE NAME SIZE FREE MEM_MAP START_PADDR START_MAPNR > 0 Normal 192512 16092 c1200000 0 0 > AREA SIZE FREE_AREA_STRUCT BLOCKS PAGES > 0 4k c08460f0 3 3 > 0 4k c08460f8 436 436 > 0 4k c0846100 15237 15237 > 0 4k c0846108 0 0 > 0 4k c0846110 0 0 > 1 8k c084611c 39 78 > 1 8k c0846124 0 0 > 1 8k c084612c 169 338 > 1 8k c0846134 0 0 > 1 8k c084613c 0 0 > 2 16k c0846148 0 0 > 2 16k c0846150 0 0 > 2 16k c0846158 0 0 > ---------Normal zone all order > 1 has no free pages > ZONE NAME SIZE FREE MEM_MAP START_PADDR START_MAPNR > 1 HighMem 69632 161 c17e0000 2f000000 192512 > AREA SIZE FREE_AREA_STRUCT BLOCKS PAGES > 0 4k c08464f0 12 12 > 0 4k c08464f8 0 0 > 0 4k c0846500 14 14 > 0 4k c0846508 3 3 > 0 4k c0846510 0 0 > 1 8k c084651c 0 0 > 1 8k c0846524 0 0 > 1 8k c084652c 0 0 > 2 16k c0846548 0 0 > 2 16k c0846550 0 0 > 2 16k c0846558 0 0 > 2 16k c0846560 1 4 > 2 16k c0846568 0 0 > 5 128k c08465cc 0 0 > 5 128k c08465d4 0 0 > 5 128k c08465dc 0 0 > 5 128k c08465e4 4 128 > 5 128k c08465ec 0 0 > ------Other's all zero > > Some other zone information I dump from pglist_data > { > watermark = {853, 1066, 1279}, > percpu_drift_mark = 0, > lowmem_reserve = {0, 2159, 2159}, > dirty_balance_reserve = 3438, > pageset = 0xc07f6144, > lock = { > { > rlock = { > raw_lock = { > lock = 0 > }, > break_lock = 0 > } > } > }, > all_unreclaimable = 0, > reclaim_stat = { > recent_rotated = {903355, 960912}, > recent_scanned = {932404, 2462017} > }, > pages_scanned = 84231, Most of scan happens in direct reclaim path, I guess but direct reclaim couldn't reclaim any pages due to lack of swap device. It means we have to set zone->all_unreclaimable in direct reclaim path, too. Below patch fix your problem? >From a5d82159b98f3d90c2f9ff9e486699fb4c67cced Mon Sep 17 00:00:00 2001 From: Minchan Kim <minchan@xxxxxxxxxx> Date: Thu, 1 Aug 2013 16:18:00 +0900 Subject:[PATCH] mm: set zone->all_unreclaimable in direct reclaim path Lisa reported there are lots of free pages in a zone but most of them is order-0 pages so it means the zone is heavily fragemented. Then, high order allocation could make direct reclaim path'slong stall( ex, 50 second) in no swap and no compaction environment. The reason is kswapd can skip the zone's scanning because the zone is lots of free pages and kswapd changes scanning order from high-order to 0-order after his first iteration is done because kswapd think order-0 allocation is the most important. Look at 73ce02e9 in detail. The problem from that is that only kswapd can set zone->all_unreclaimable to 1 at the moment so direct reclaim path should loop forever until a ghost can set the zone->all_unreclaimable to 1. This patch makes direct reclaim path to set zone->all_unreclaimable to avoid infinite loop. So now we don't need a ghost. Reported-by: Lisa Du <cldu@xxxxxxxxxxx> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> --- mm/vmscan.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 33dc256..f957e87 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2317,6 +2317,23 @@ static bool all_unreclaimable(struct zonelist *zonelist, return true; } +static void check_zones_unreclaimable(struct zonelist *zonelist, + struct scan_control *sc) +{ + struct zoneref *z; + struct zone *zone; + + for_each_zone_zonelist_nodemask(zone, z, zonelist, + gfp_zone(sc->gfp_mask), sc->nodemask) { + if (!populated_zone(zone)) + continue; + if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL)) + continue; + if (!zone_reclaimable(zone)) + zone->all_unreclaimable = 1; + } +} + /* * This is the main entry point to direct page reclaim. * @@ -2370,7 +2387,17 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, lru_pages += zone_reclaimable_pages(zone); } - shrink_slab(shrink, sc->nr_scanned, lru_pages); + /* + * When a zone has enough order-0 free memory but + * zone is heavily fragmented and we need high order + * page from the zone, kswapd could skip the zone + * after first iteration with high order. So, kswapd + * never set the zone->all_unreclaimable to 1 so + * direct reclaim path needs the check. + */ + if (!shrink_slab(shrink, sc->nr_scanned, lru_pages)) + check_zones_unreclaimable(zonelist, sc); + if (reclaim_state) { sc->nr_reclaimed += reclaim_state->reclaimed_slab; reclaim_state->reclaimed_slab = 0; -- 1.7.9.5 -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>