On Fri, Jan 28, 2011 at 3:48 PM, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote: > * MinChan Kim <minchan.kim@xxxxxxxxx> [2011-01-28 14:44:50]: > >> On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh >> <balbir@xxxxxxxxxxxxxxxxxx> wrote: >> > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote: >> > [snip] >> > >> >>> index 7b56473..2ac8549 100644 >> >>> --- a/mm/page_alloc.c >> >>> +++ b/mm/page_alloc.c >> >>> @@ -1660,6 +1660,9 @@ zonelist_scan: >> >>> Â Â Â Â Â Â Â Â Â Â Â Âunsigned long mark; >> >>> Â Â Â Â Â Â Â Â Â Â Â Âint ret; >> >>> >> >>> + Â Â Â Â Â Â Â Â Â Â Â if (should_reclaim_unmapped_pages(zone)) >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â wakeup_kswapd(zone, order, classzone_idx); >> >>> + >> >> >> >> Do we really need the check in fastpath? >> >> There are lost of caller of alloc_pages. >> >> Many of them are not related to mapped pages. >> >> Could we move the check into add_to_page_cache_locked? >> > >> > The check is a simple check to see if the unmapped pages need >> > balancing, the reason I placed this check here is to allow other >> > allocations to benefit as well, if there are some unmapped pages to be >> > freed. add_to_page_cache_locked (check under a critical section) is >> > even worse, IMHO. >> >> It just moves the overhead from general into specific case(ie, >> allocates page for just page cache). >> Another cases(ie, allocates pages for other purpose except page cache, >> ex device drivers or fs allocation for internal using) aren't >> affected. >> So, It would be better. >> >> The goal in this patch is to remove only page cache page, isn't it? >> So I think we could the balance check in add_to_page_cache and trigger reclaim. >> If we do so, what's the problem? >> > > I see it as a tradeoff of when to check? add_to_page_cache or when we > are want more free memory (due to allocation). It is OK to wakeup > kswapd while allocating memory, somehow for this purpose (global page > cache), add_to_page_cache or add_to_page_cache_locked does not seem > the right place to hook into. I'd be open to comments/suggestions > though from others as well. > >> > >> > >> >> >> >>> Â Â Â Â Â Â Â Â Â Â Â Âmark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK]; >> >>> Â Â Â Â Â Â Â Â Â Â Â Âif (zone_watermark_ok(zone, order, mark, >> >>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âclasszone_idx, alloc_flags)) >> >>> @@ -4167,8 +4170,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, >> >>> >> >>> Â Â Â Â Â Â Â Âzone->spanned_pages = size; >> >>> Â Â Â Â Â Â Â Âzone->present_pages = realsize; >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) >> >>> Â Â Â Â Â Â Â Âzone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio) >> >>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â/ 100; >> >>> + Â Â Â Â Â Â Â zone->max_unmapped_pages = (realsize*sysctl_max_unmapped_ratio) >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â / 100; >> >>> +#endif >> >>> Â#ifdef CONFIG_NUMA >> >>> Â Â Â Â Â Â Â Âzone->node = nid; >> >>> Â Â Â Â Â Â Â Âzone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100; >> >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, int write, >> >>> Â Â Â Âreturn 0; >> >>> Â} >> >>> >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) >> >>> Âint sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >> >>> Â Â Â Âvoid __user *buffer, size_t *length, loff_t *ppos) >> >>> Â{ >> >>> @@ -5100,6 +5108,23 @@ int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >> >>> Â Â Â Âreturn 0; >> >>> Â} >> >>> >> >>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >> >>> + Â Â Â void __user *buffer, size_t *length, loff_t *ppos) >> >>> +{ >> >>> + Â Â Â struct zone *zone; >> >>> + Â Â Â int rc; >> >>> + >> >>> + Â Â Â rc = proc_dointvec_minmax(table, write, buffer, length, ppos); >> >>> + Â Â Â if (rc) >> >>> + Â Â Â Â Â Â Â return rc; >> >>> + >> >>> + Â Â Â for_each_zone(zone) >> >>> + Â Â Â Â Â Â Â zone->max_unmapped_pages = (zone->present_pages * >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â sysctl_max_unmapped_ratio) / 100; >> >>> + Â Â Â return 0; >> >>> +} >> >>> +#endif >> >>> + >> >>> Â#ifdef CONFIG_NUMA >> >>> Âint sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write, >> >>> Â Â Â Âvoid __user *buffer, size_t *length, loff_t *ppos) >> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >> >>> index 02cc82e..6377411 100644 >> >>> --- a/mm/vmscan.c >> >>> +++ b/mm/vmscan.c >> >>> @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem); >> >>> Â#define scanning_global_lru(sc) Â Â Â Â(1) >> >>> Â#endif >> >>> >> >>> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) >> >>> +static unsigned long reclaim_unmapped_pages(int priority, struct zone *zone, >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â struct scan_control *sc); >> >>> +static int unmapped_page_control __read_mostly; >> >>> + >> >>> +static int __init unmapped_page_control_parm(char *str) >> >>> +{ >> >>> + Â Â Â unmapped_page_control = 1; >> >>> + Â Â Â /* >> >>> + Â Â Â Â* XXX: Should we tweak swappiness here? >> >>> + Â Â Â Â*/ >> >>> + Â Â Â return 1; >> >>> +} >> >>> +__setup("unmapped_page_control", unmapped_page_control_parm); >> >>> + >> >>> +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */ >> >>> +static inline unsigned long reclaim_unmapped_pages(int priority, >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â struct zone *zone, struct scan_control *sc) >> >>> +{ >> >>> + Â Â Â return 0; >> >>> +} >> >>> +#endif >> >>> + >> >>> Âstatic struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone, >> >>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âstruct scan_control *sc) >> >>> Â{ >> >>> @@ -2359,6 +2382,12 @@ loop_again: >> >>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âshrink_active_list(SWAP_CLUSTER_MAX, zone, >> >>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â&sc, priority, 0); >> >>> >> >>> + Â Â Â Â Â Â Â Â Â Â Â /* >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â* We do unmapped page reclaim once here and once >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â* below, so that we don't lose out >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â*/ >> >>> + Â Â Â Â Â Â Â Â Â Â Â reclaim_unmapped_pages(priority, zone, &sc); >> >>> + >> >>> Â Â Â Â Â Â Â Â Â Â Â Âif (!zone_watermark_ok_safe(zone, order, >> >>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âhigh_wmark_pages(zone), 0, 0)) { >> >>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âend_zone = i; >> >>> @@ -2396,6 +2425,11 @@ loop_again: >> >>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âcontinue; >> >>> >> >>> Â Â Â Â Â Â Â Â Â Â Â Âsc.nr_scanned = 0; >> >>> + Â Â Â Â Â Â Â Â Â Â Â /* >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â* Reclaim unmapped pages upfront, this should be >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â* really cheap >> >>> + Â Â Â Â Â Â Â Â Â Â Â Â*/ >> >>> + Â Â Â Â Â Â Â Â Â Â Â reclaim_unmapped_pages(priority, zone, &sc); >> >> >> >> Why should we do by two phase? >> >> It's not a direct reclaim path. I mean it doesn't need to reclaim tighly >> >> If we can't reclaim enough, next allocation would wake up kswapd again >> >> and kswapd try it again. >> >> >> > >> > I am not sure I understand, the wakeup will occur only if the unmapped >> > pages are still above the max_unmapped_ratio. They are tunable control >> > points. >> >> I mean you try to reclaim twice in one path. >> one is when select highest zone to reclaim. >> one is when VM reclaim the zone. >> >> What's your intention? >> > > That is because some zones can be skipped, we need to ensure we go > through all zones, rather than selective zones (limited via search for > end_zone). If kswapd is wake up by unmapped memory of some zone, we have to include the zone while selective victim zones to prevent miss the zone. I think it would be better than reclaiming twice > >> >> > >> >> And I have a concern. I already pointed out. >> >> If memory pressure is heavy and unmappd_pages is more than our >> >> threshold, this can move inactive's tail pages which are mapped into >> >> heads by reclaim_unmapped_pages. It can make confusing LRU order so >> >> working set can be evicted. >> >> >> > >> > Sorry, not sure ÂI understand completely? The LRU order is disrupted >> > because we selectively scan unmapped pages. shrink_page_list() will >> > ignore mapped pages and put them back in the LRU at head? Here is a >> > quick take on what happens >> > >> > zone_reclaim() will be invoked as a result of these patches and the >> > pages it tries to reclaim is very few (1 << order). Active list will >> > be shrunk only when the inactive anon or inactive list is low in size. >> > I don't see a major churn happening unless we keep failing to reclaim >> > unmapped pages. In any case we isolate inactive pages and try to >> > reclaim minimal memory, the churn is mostly in the inactive list if >> > the page is not reclaimed (am I missing anything?). >> >> You understand my question completely. :) >> In inactive list, page order is important, too although it's weak >> lumpy and compaction as time goes by. >> If threshold up and down happens Âfrequently, victim pages in inactive >> list could move into head and it's not good. > > But the assumption for LRU order to change happens only if the page > cannot be successfully freed, which means it is in some way active.. > and needs to be moved no? 1. holded page by someone 2. mapped pages 3. active pages 1 is rare so it isn't the problem. Of course, in case of 3, we have to activate it so no problem. The problem is 2. > > Thanks for the detailed review! Thanks for giving the fun to me. :) > > -- > Â Â Â ÂThree Cheers, > Â Â Â ÂBalbir > -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html