On Fri, Jun 22, 2012 at 3:56 AM, Aaditya Kumar <aaditya.kumar.30@xxxxxxxxx> wrote: > On Fri, Jun 22, 2012 at 12:52 PM, KOSAKI Motohiro > <kosaki.motohiro@xxxxxxxxx> wrote: >>> Let me summary again. >>> >>> The problem: >>> >>> when hotplug offlining happens on zone A, it starts to freed page as MIGRATE_ISOLATE type in buddy. >>> (MIGRATE_ISOLATE is very irony type because it's apparently on buddy but we can't allocate them) >>> When the memory shortage happens during hotplug offlining, current task starts to reclaim, then wake up kswapd. >>> Kswapd checks watermark, then go sleep BECAUSE current zone_watermark_ok_safe doesn't consider >>> MIGRATE_ISOLATE freed page count. Current task continue to reclaim in direct reclaim path without kswapd's help. >>> The problem is that zone->all_unreclaimable is set by only kswapd so that current task would be looping forever >>> like below. >>> >>> __alloc_pages_slowpath >>> restart: >>> wake_all_kswapd >>> rebalance: >>> __alloc_pages_direct_reclaim >>> do_try_to_free_pages >>> if global_reclaim && !all_unreclaimable >>> return 1; /* It means we did did_some_progress */ >>> skip __alloc_pages_may_oom >>> should_alloc_retry >>> goto rebalance; >>> >>> If we apply KOSAKI's patch[1] which doesn't depends on kswapd about setting zone->all_unreclaimable, >>> we can solve this problem by killing some task. But it doesn't wake up kswapd, still. >>> It could be a problem still if other subsystem needs GFP_ATOMIC request. >>> So kswapd should consider MIGRATE_ISOLATE when it calculate free pages before going sleep. >> >> I agree. And I believe we should remove rebalance label and alloc >> retrying should always wake up kswapd. >> because wake_all_kswapd is unreliable, it have no guarantee to success >> to wake up kswapd. then this >> micro optimization is NOT optimization. Just trouble source. Our >> memory reclaim logic has a lot of race >> by design. then any reclaim code shouldn't believe some one else works fine. >> > > I think this is a better approach, since MIGRATE_ISLOATE is really a > temporary phenomenon, it makes sense to just retry allocation. > One issue however, with this approach is that it does not exactly work > for PAGE_ALLOC_COSTLY_ORDER, But well, given the > frequency of such allocation, I think may be it is an acceptable > compromise to handle such request by OOM in case of many > MIGRATE_ISOLATE > pages present. > > what do you think ? I think we need both change. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href