On 06/20/2012 07:17 AM, KOSAKI Motohiro wrote: > (6/17/12 8:43 PM), Minchan Kim wrote: >> On 06/17/2012 02:48 AM, Aaditya Kumar wrote: >> >>> On Fri, Jun 15, 2012 at 12:57 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote: >>> >>>>> >>>>> pgdat_balanced() doesn't recognized zone. Therefore kswapd may sleep >>>>> if node has multiple zones. Hm ok, I realized my descriptions was >>>>> slightly misleading. priority 0 is not needed. bakance_pddat() calls >>>>> pgdat_balanced() >>>>> every priority. Most easy case is, movable zone has a lot of free pages and >>>>> normal zone has no reclaimable page. >>>>> >>>>> btw, current pgdat_balanced() logic seems not correct. kswapd should >>>>> sleep only if every zones have much free pages than high water mark >>>>> _and_ 25% of present pages in node are free. >>>>> >>>> >>>> >>>> Sorry. I can't understand your point. >>>> Current kswapd doesn't sleep if relevant zones don't have free pages above high watermark. >>>> It seems I am missing your point. >>>> Please anybody correct me. >>> >>> Since currently direct reclaim is given up based on >>> zone->all_unreclaimable flag, >>> so for e.g in one of the scenarios: >>> >>> Lets say system has one node with two zones (NORMAL and MOVABLE) and we >>> hot-remove the all the pages of the MOVABLE zone. >>> >>> While migrating pages during memory hot-unplugging, the allocation function >>> (for new page to which the page in MOVABLE zone would be moved) can end up >>> looping in direct reclaim path for ever. >>> >>> This is so because when most of the pages in the MOVABLE zone have >>> been migrated, >>> the zone now contains lots of free memory (basically above low watermark) >>> BUT all are in MIGRATE_ISOLATE list of the buddy list. >>> >>> So kswapd() would not balance this zone as free pages are above low watermark >>> (but all are in isolate list). So zone->all_unreclaimable flag would >>> never be set for this zone >>> and allocation function would end up looping forever. (assuming the >>> zone NORMAL is >>> left with no reclaimable memory) >>> >> >> >> Thanks a lot, Aaditya! Scenario you mentioned makes perfect. >> But I don't see it's a problem of kswapd. >> >> a5d76b54 made new migration type 'MIGRATE_ISOLATE' which is very irony type because there are many free pages in free list >> but we can't allocate it. :( >> It doesn't reflect right NR_FREE_PAGES while many places in the kernel use NR_FREE_PAGES to trigger some operation. >> Kswapd is just one of them confused. >> As right fix of this problem, we should fix hot plug code, IMHO which can fix CMA, too. >> >> This patch could make inconsistency between NR_FREE_PAGES and SumOf[free_area[order].nr_free] >> and it could make __zone_watermark_ok confuse so we might need to fix move_freepages_block itself to reflect >> free_area[order].nr_free exactly. >> >> Any thought? >> >> Side Note: I still need KOSAKI's patch with fixed description regardless of this problem because set zone->all_unreclaimable of only kswapd is very fragile. >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 4403009..19de56c 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5593,8 +5593,10 @@ int set_migratetype_isolate(struct page *page) >> >> out: >> if (!ret) { >> + int pages_moved; >> set_pageblock_migratetype(page, MIGRATE_ISOLATE); >> - move_freepages_block(zone, page, MIGRATE_ISOLATE); >> + pages_moved = move_freepages_block(zone, page, MIGRATE_ISOLATE); >> + __mod_zone_page_state(zone, NR_FREE_PAGES, -pages_moved); >> } >> >> spin_unlock_irqrestore(&zone->lock, flags); >> @@ -5607,12 +5609,14 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype) >> { >> struct zone *zone; >> unsigned long flags; >> + int pages_moved; >> zone = page_zone(page); >> spin_lock_irqsave(&zone->lock, flags); >> if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) >> goto out; >> set_pageblock_migratetype(page, migratetype); >> - move_freepages_block(zone, page, migratetype); >> + pages_moved = move_freepages_block(zone, page, migratetype); >> + __mod_zone_page_state(zone, NR_FREE_PAGES, pages_moved); >> out: >> spin_unlock_irqrestore(&zone->lock, flags); >> } > > Unfortunately, this doesn't work. there are two reasons. 1) when memory hotplug occue, we have > two scenarios. a) free page -> page block change into isolate b) page block change into isolate > -> free page. The above patch only care scenario (a). Thus it lead to confusing NR_FREE_PAGES value. > _if_ we put a new branch free page hotpath, we can solve scenario (b). but I don't like it. because of, > zero hotpath overhead is one of memory hotplug design principle. 2) event if we can solve above issue, Yeb. Aaditya already pointed out. And I just sent other patch. Let's talk about this problem on another thread because it's not a direct/background reclaim problem. http://lkml.org/lkml/2012/6/20/30 > all_unreclaimable logic still broken. because of, __alloc_pages_slowpath() wake up kswapd only once and > don't wake up when "goto rebalance" path. But, wake_all_kswapd() is racy and no guarantee to wake up > kswapd. It mean direct reclaim should work fine w/o background reclaim. We can fix it easily in direct reclaim path but I think your approach still make sense because current scheme of zone_unreclaimable setting is very fragile on livelock. So if you send your patch again with rewritten description, I have no objection. Thanks. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>