Hi Balbir, On Fri, Feb 18, 2011 at 3:29 PM, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote: > * MinChan Kim <minchan.kim@xxxxxxxxx> [2011-02-18 00:08:19]: > >> Recently, there are reported problem about thrashing. >> (http://marc.info/?l=rsync&m=128885034930933&w=2) >> It happens by backup workloads(ex, nightly rsync). >> That's because the workload makes just use-once pages >> and touches pages twice. It promotes the page into >> active list so that it results in working set page eviction. >> >> Some app developer want to support POSIX_FADV_NOREUSE. >> But other OSes don't support it, either. >> (http://marc.info/?l=linux-mm&m=128928979512086&w=2) >> >> By other approach, app developers use POSIX_FADV_DONTNEED. >> But it has a problem. If kernel meets page is writing >> during invalidate_mapping_pages, it can't work. >> It makes for application programmer to use it since they always >> have to sync data before calling fadivse(..POSIX_FADV_DONTNEED) to >> make sure the pages could be discardable. At last, they can't use >> deferred write of kernel so that they could see performance loss. >> (http://insights.oetiker.ch/linux/fadvise.html) >> >> In fact, invalidation is very big hint to reclaimer. >> It means we don't use the page any more. So let's move >> the writing page into inactive list's head if we can't truncate >> it right now. >> >> Why I move page to head of lru on this patch, Dirty/Writeback page >> would be flushed sooner or later. It can prevent writeout of pageout >> which is less effective than flusher's writeout. >> >> Originally, I reused lru_demote of Peter with some change so added >> his Signed-off-by. >> >> Reported-by: Ben Gamari <bgamari.foss@xxxxxxxxx> >> Signed-off-by: Minchan Kim <minchan.kim@xxxxxxxxx> >> Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> Acked-by: Rik van Riel <riel@xxxxxxxxxx> >> Acked-by: Mel Gorman <mel@xxxxxxxxx> >> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> >> Cc: Wu Fengguang <fengguang.wu@xxxxxxxxx> >> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> >> Cc: Nick Piggin <npiggin@xxxxxxxxx> >> Signed-off-by: Minchan Kim <minchan.kim@xxxxxxxxx> >> --- >> Changelog since v4: >> Â- Change function comments - suggested by Johannes >> Â- Change function name - suggested by Johannes >> Â- Drop only dirty/writeback pages to deactive pagevec - suggested by Johannes >> Â- Add acked-by >> >> Changelog since v3: >> Â- Change function comments - suggested by Johannes >> Â- Change function name - suggested by Johannes >> Â- add only dirty/writeback pages to deactive pagevec >> >> Changelog since v2: >> Â- mapped page leaves alone - suggested by Mel >> Â- pass part related PG_reclaim in next patch. >> >> Changelog since v1: >> Â- modify description >> Â- correct typo >> Â- add some comment >> >> Âinclude/linux/swap.h |  Â1 + >> Âmm/swap.c      Â|  78 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> Âmm/truncate.c    Â|  17 ++++++++--- >> Â3 files changed, 91 insertions(+), 5 deletions(-) >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index 4d55932..c335055 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h >> @@ -215,6 +215,7 @@ extern void mark_page_accessed(struct page *); >> Âextern void lru_add_drain(void); >> Âextern int lru_add_drain_all(void); >> Âextern void rotate_reclaimable_page(struct page *page); >> +extern void deactivate_page(struct page *page); >> Âextern void swap_setup(void); >> >> Âextern void add_page_to_unevictable_list(struct page *page); >> diff --git a/mm/swap.c b/mm/swap.c >> index c02f936..4aea806 100644 >> --- a/mm/swap.c >> +++ b/mm/swap.c >> @@ -39,6 +39,7 @@ int page_cluster; >> >> Âstatic DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs); >> Âstatic DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); >> +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); >> >> Â/* >>  * This path almost never happens for VM activity - pages are normally >> @@ -347,6 +348,60 @@ void add_page_to_unevictable_list(struct page *page) >> Â} >> >> Â/* >> + * If the page can not be invalidated, it is moved to the >> + * inactive list to speed up its reclaim. ÂIt is moved to the >> + * head of the list, rather than the tail, to give the flusher >> + * threads some time to write it out, as this is much more >> + * effective than the single-page writeout from reclaim. >> + */ >> +static void lru_deactivate(struct page *page, struct zone *zone) >> +{ >> +   int lru, file; >> + >> +   if (!PageLRU(page) || !PageActive(page)) >> +       return; >> + >> +   /* Some processes are using the page */ >> +   if (page_mapped(page)) >> +       return; >> + >> +   file = page_is_file_cache(page); >> +   lru = page_lru_base_type(page); >> +   del_page_from_lru_list(zone, page, lru + LRU_ACTIVE); >> +   ClearPageActive(page); >> +   ClearPageReferenced(page); >> +   add_page_to_lru_list(zone, page, lru); >> +   __count_vm_event(PGDEACTIVATE); >> + >> +   update_page_reclaim_stat(zone, page, file, 0); >> +} >> + >> +static void ____pagevec_lru_deactivate(struct pagevec *pvec) >> +{ >> +   int i; >> +   struct zone *zone = NULL; >> + >> +   for (i = 0; i < pagevec_count(pvec); i++) { >> +       struct page *page = pvec->pages[i]; >> +       struct zone *pagezone = page_zone(page); >> + >> +       if (pagezone != zone) { >> +           if (zone) >> +               spin_unlock_irq(&zone->lru_lock); >> +           zone = pagezone; >> +           spin_lock_irq(&zone->lru_lock); >> +       } > > The optimization to avoid taking locks if the zone does not change is > quite subtle I just used it without big considering as it's a normal technique of page array handling we have been used. So I want to keep it if it doesn't make big overhead. > >> +       lru_deactivate(page, zone); >> +   } >> +   if (zone) >> +       spin_unlock_irq(&zone->lru_lock); >> + >> +   release_pages(pvec->pages, pvec->nr, pvec->cold); >> +   pagevec_reinit(pvec); >> +} >> + >> + >> +/* >>  * Drain pages out of the cpu's pagevecs. >>  * Either "cpu" is the current CPU, and preemption has already been >>  * disabled; or "cpu" is being hot-unplugged, and is already dead. >> @@ -372,6 +427,29 @@ static void drain_cpu_pagevecs(int cpu) >>        pagevec_move_tail(pvec); >>        local_irq_restore(flags); >>    } >> + >> +   pvec = &per_cpu(lru_deactivate_pvecs, cpu); >> +   if (pagevec_count(pvec)) >> +       ____pagevec_lru_deactivate(pvec); >> +} >> + >> +/** >> + * deactivate_page - forcefully deactivate a page >> + * @page: page to deactivate >> + * >> + * This function hints the VM that @page is a good reclaim candidate, >> + * for example if its invalidation fails due to the page being dirty >> + * or under writeback. >> + */ >> +void deactivate_page(struct page *page) >> +{ >> +   if (likely(get_page_unless_zero(page))) { >> +       struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); >> + >> +       if (!pagevec_add(pvec, page)) >> +           ____pagevec_lru_deactivate(pvec); >> +       put_cpu_var(lru_deactivate_pvecs); >> +   } >> Â} >> >> Âvoid lru_add_drain(void) >> diff --git a/mm/truncate.c b/mm/truncate.c >> index 4d415b3..9ec7bc5 100644 >> --- a/mm/truncate.c >> +++ b/mm/truncate.c >> @@ -328,11 +328,12 @@ EXPORT_SYMBOL(truncate_inode_pages); >>  * pagetables. >>  */ >> Âunsigned long invalidate_mapping_pages(struct address_space *mapping, >> -                  Âpgoff_t start, pgoff_t end) >> +       pgoff_t start, pgoff_t end) >> Â{ >>    struct pagevec pvec; >>    pgoff_t next = start; >> -   unsigned long ret = 0; >> +   unsigned long ret; >> +   unsigned long count = 0; >>    int i; >> >>    pagevec_init(&pvec, 0); >> @@ -359,8 +360,14 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping, >>            if (lock_failed) >>                continue; >> >> -           ret += invalidate_inode_page(page); >> - >> +           ret = invalidate_inode_page(page); >> +           /* >> +           Â* Invalidation is a hint that the page is no longer >> +           Â* of interest and try to speed up its reclaim. >> +           Â*/ >> +           if (!ret) >> +               deactivate_page(page); > > Do we need to do this under page_lock? Is there scope for us to reuse > rotate_reclaimable_page() logic? Good point. I think we don't need page_lock. will fix. About rotate_reclaimable_page, it has little bit similar logic but several page flags test and irq disable are different so it would result in ugly shape as far as I think. I hope if you have a good idea, please, do refactoring after merging. Thanks for the review, Balbir. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href