On Wed, Mar 23, 2011 at 2:21 PM, KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> wrote: > Hi Minchan, > >> > zone->all_unreclaimable and zone->pages_scanned are neigher atomic >> > variables nor protected by lock. Therefore a zone can become a state >> > of zone->page_scanned=0 and zone->all_unreclaimable=1. In this case, >> >> Possible although it's very rare. > > Can you test by yourself andrey's case on x86 box? It seems > reprodusable. > >> > current all_unreclaimable() return false even though >> > zone->all_unreclaimabe=1. >> >> The case is very rare since we reset zone->all_unreclaimabe to zero >> right before resetting zone->page_scanned to zero. >> But I admit it's possible. > > Please apply this patch and run oom-killer. You may see following > pages_scanned:0 and all_unreclaimable:yes combination. likes below. > (but you may need >30min) > >    ÂNode 0 DMA free:4024kB min:40kB low:48kB high:60kB active_anon:11804kB >    Âinactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB >    Âisolated(anon):0kB isolated(file):0kB present:15676kB mlocked:0kB >    Âdirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB >    Âslab_unreclaimable:0kB kernel_stack:0kB pagetables:68kB unstable:0kB >    Âbounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes > > >> >>     CPU 0                      CPU 1 >> free_pcppages_bulk               Âbalance_pgdat >>     zone->all_unreclaimabe = 0 >>                             zone->all_unreclaimabe = 1 >>     zone->pages_scanned = 0 >> > >> > Is this ignorable minor issue? No. Unfortunatelly, x86 has very >> > small dma zone and it become zone->all_unreclamble=1 easily. and >> > if it becase all_unreclaimable, it never return all_unreclaimable=0 >>     ^^^^^ it's very important verb.  Â^^^^^ return? reset? >> >>     I can't understand your point due to the typo. Please correct the typo. >> >> > beucase it typicall don't have reclaimable pages. >> >> If DMA zone have very small reclaimable pages or zero reclaimable pages, >> zone_reclaimable() can return false easily so all_unreclaimable() could return >> true. Eventually oom-killer might works. > > The point is, vmscan has following all_unreclaimable check in several place. > >            Âif (zone->all_unreclaimable && priority != DEF_PRIORITY) >                Âcontinue; > > But, if the zone has only a few lru pages, get_scan_count(DEF_PRIORITY) return > {0, 0, 0, 0} array. It mean zone will never scan lru pages anymore. therefore > false negative smaller pages_scanned can't be corrected. > > Then, false negative all_unreclaimable() also can't be corrected. > > > btw, Why get_scan_count() return 0 instead 1? Why don't we round up? > Git log says it is intentionally. > >    Âcommit e0f79b8f1f3394bb344b7b83d6f121ac2af327de >    ÂAuthor: Johannes Weiner <hannes@xxxxxxxxxxxx> >    ÂDate:  Sat Oct 18 20:26:55 2008 -0700 > >      Âvmscan: don't accumulate scan pressure on unrelated lists > >> >> In my test, I saw the livelock, too so apparently we have a problem. >> I couldn't dig in it recently by another urgent my work. >> I think you know root cause but the description in this patch isn't enough >> for me to be persuaded. >> >> Could you explain the root cause in detail? > > If you have an another fixing idea, please let me know. :) > > > > Okay. I got it. The problem is following as. By the race the free_pcppages_bulk and balance_pgdat, it is possible zone->all_unreclaimable = 1 and zone->pages_scanned = 0. DMA zone have few LRU pages and in case of no-swap and big memory pressure, there could be a just a page in inactive file list like your example. (anon lru pages isn't important in case of non-swap system) In such case, shrink_zones doesn't scan the page at all until priority become 0 as get_scan_count does scan >>= priority(it's mostly zero). And although priority become 0, nr_scan_try_batch returns zero until saved pages become 32. So for scanning the page, at least, we need 32 times iteration of priority 12..0. If system has fork-bomb, it is almost livelock. If is is right, how about this? diff --git a/mm/vmscan.c b/mm/vmscan.c index 148c6e6..34983e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1973,6 +1973,9 @@ static void shrink_zones(int priority, struct zonelist *zonelist, static bool zone_reclaimable(struct zone *zone) { + if (zone->all_unreclaimable) + return false; + return zone->pages_scanned < zone_reclaimable_pages(zone) * 6; } -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href