> Let me summary again. > > The problem: > > when hotplug offlining happens on zone A, it starts to freed page as MIGRATE_ISOLATE type in buddy. > (MIGRATE_ISOLATE is very irony type because it's apparently on buddy but we can't allocate them) > When the memory shortage happens during hotplug offlining, current task starts to reclaim, then wake up kswapd. > Kswapd checks watermark, then go sleep BECAUSE current zone_watermark_ok_safe doesn't consider > MIGRATE_ISOLATE freed page count. Current task continue to reclaim in direct reclaim path without kswapd's help. > The problem is that zone->all_unreclaimable is set by only kswapd so that current task would be looping forever > like below. > > __alloc_pages_slowpath > restart: > wake_all_kswapd > rebalance: > __alloc_pages_direct_reclaim > do_try_to_free_pages > if global_reclaim && !all_unreclaimable > return 1; /* It means we did did_some_progress */ > skip __alloc_pages_may_oom > should_alloc_retry > goto rebalance; > > If we apply KOSAKI's patch[1] which doesn't depends on kswapd about setting zone->all_unreclaimable, > we can solve this problem by killing some task. But it doesn't wake up kswapd, still. > It could be a problem still if other subsystem needs GFP_ATOMIC request. > So kswapd should consider MIGRATE_ISOLATE when it calculate free pages before going sleep. I agree. And I believe we should remove rebalance label and alloc retrying should always wake up kswapd. because wake_all_kswapd is unreliable, it have no guarantee to success to wake up kswapd. then this micro optimization is NOT optimization. Just trouble source. Our memory reclaim logic has a lot of race by design. then any reclaim code shouldn't believe some one else works fine. > Firstly I tried to solve this problem by this. > https://lkml.org/lkml/2012/6/20/30 > The patch's goal was to NOT increase nr_free and NR_FREE_PAGES when we free page into MIGRATE_ISOLATED. > But it increases little overhead in higher order free page but I think it's not a big deal. > More problem is duplicated codes for handling only MIGRATE_ISOLATE freed page. > > Second approach which is suggested by KOSAKI is what you mentioned. > But the concern about second approach is how to make sure matched count increase/decrease of nr_isolated_areas. > I mean how to make sure nr_isolated_areas would be zero when isolation is done. > Of course, we can investigate all of current caller and make sure they don't make mistake > now. But it's very error-prone if we consider future's user. > So we might need test_set_pageblock_migratetype(page, MIGRATE_ISOLATE); > > IMHO, ideal solution is that we remove MIGRATE_ISOLATE type totally in buddy. > For it, there is no problem to isolate already freed page in buddy allocator but the concern is how to handle > freed page later by do_migrate_range in memory_hotplug.c. > We can create custom putback_lru_pages > > put_page_hotplug(page) > { > int migratetype = get_pageblock_migratetype(page) > VM_BUG_ON(migratetype != MIGRATE_ISOLATE); > __page_cache_release(page); > free_one_page(zone, page, 0, MIGRATE_ISOLATE); > } > > putback_lru_pages_hotplug(&source) > { > foreach page from source > put_page_hotplug(page) > } > > do_migrate_range() > { > migrate_pages(&source); > putback_lru_pages_hotplug(&source); > } > > I hope this summary can help you, Kame and If I miss something, please let me know it. I disagree this. Because of, memory hotplug intentionally don't use stopmachine. It is because we don't stop any system service when memory is being unpluged. That's said various subsystem try to allocate memory during page migration for memory unplug. IOW, we shouldn't do_migrate_page() is only one caller. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href