On Fri, Jun 22, 2012 at 12:52 PM, KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxx> wrote: >> Let me summary again. >> >> The problem: >> >> when hotplug offlining happens on zone A, it starts to freed page as MIGRATE_ISOLATE type in buddy. >> (MIGRATE_ISOLATE is very irony type because it's apparently on buddy but we can't allocate them) >> When the memory shortage happens during hotplug offlining, current task starts to reclaim, then wake up kswapd. >> Kswapd checks watermark, then go sleep BECAUSE current zone_watermark_ok_safe doesn't consider >> MIGRATE_ISOLATE freed page count. Current task continue to reclaim in direct reclaim path without kswapd's help. >> The problem is that zone->all_unreclaimable is set by only kswapd so that current task would be looping forever >> like below. >> >> __alloc_pages_slowpath >> restart: >> wake_all_kswapd >> rebalance: >> __alloc_pages_direct_reclaim >> do_try_to_free_pages >> if global_reclaim && !all_unreclaimable >> return 1; /* It means we did did_some_progress */ >> skip __alloc_pages_may_oom >> should_alloc_retry >> goto rebalance; >> >> If we apply KOSAKI's patch[1] which doesn't depends on kswapd about setting zone->all_unreclaimable, >> we can solve this problem by killing some task. But it doesn't wake up kswapd, still. >> It could be a problem still if other subsystem needs GFP_ATOMIC request. >> So kswapd should consider MIGRATE_ISOLATE when it calculate free pages before going sleep. > > I agree. And I believe we should remove rebalance label and alloc > retrying should always wake up kswapd. > because wake_all_kswapd is unreliable, it have no guarantee to success > to wake up kswapd. then this > micro optimization is NOT optimization. Just trouble source. Our > memory reclaim logic has a lot of race > by design. then any reclaim code shouldn't believe some one else works fine. > I think this is a better approach, since MIGRATE_ISLOATE is really a temporary phenomenon, it makes sense to just retry allocation. One issue however, with this approach is that it does not exactly work for PAGE_ALLOC_COSTLY_ORDER, But well, given the frequency of such allocation, I think may be it is an acceptable compromise to handle such request by OOM in case of many MIGRATE_ISOLATE pages present. what do you think ? > >> Firstly I tried to solve this problem by this. >> https://lkml.org/lkml/2012/6/20/30 >> The patch's goal was to NOT increase nr_free and NR_FREE_PAGES when we free page into MIGRATE_ISOLATED. >> But it increases little overhead in higher order free page but I think it's not a big deal. >> More problem is duplicated codes for handling only MIGRATE_ISOLATE freed page. >> >> Second approach which is suggested by KOSAKI is what you mentioned. >> But the concern about second approach is how to make sure matched count increase/decrease of nr_isolated_areas. >> I mean how to make sure nr_isolated_areas would be zero when isolation is done. >> Of course, we can investigate all of current caller and make sure they don't make mistake >> now. But it's very error-prone if we consider future's user. >> So we might need test_set_pageblock_migratetype(page, MIGRATE_ISOLATE); >> >> IMHO, ideal solution is that we remove MIGRATE_ISOLATE type totally in buddy. >> For it, there is no problem to isolate already freed page in buddy allocator but the concern is how to handle >> freed page later by do_migrate_range in memory_hotplug.c. >> We can create custom putback_lru_pages >> >> put_page_hotplug(page) >> { >> int migratetype = get_pageblock_migratetype(page) >> VM_BUG_ON(migratetype != MIGRATE_ISOLATE); >> __page_cache_release(page); >> free_one_page(zone, page, 0, MIGRATE_ISOLATE); >> } >> >> putback_lru_pages_hotplug(&source) >> { >> foreach page from source >> put_page_hotplug(page) >> } >> >> do_migrate_range() >> { >> migrate_pages(&source); >> putback_lru_pages_hotplug(&source); >> } >> >> I hope this summary can help you, Kame and If I miss something, please let me know it. > > I disagree this. Because of, memory hotplug intentionally don't use > stopmachine. It is because > we don't stop any system service when memory is being unpluged. That's > said various subsystem > try to allocate memory during page migration for memory unplug. IOW, > we shouldn't do_migrate_page() > is only one caller. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href