Hi Hannes, On Tue, Nov 30, 2010 at 2:22 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > On Tue, Nov 30, 2010 at 12:23:19AM +0900, Minchan Kim wrote: >> Recently, there are reported problem about thrashing. >> (http://marc.info/?l=rsync&m=128885034930933&w=2) >> It happens by backup workloads(ex, nightly rsync). >> That's because the workload makes just use-once pages >> and touches pages twice. It promotes the page into >> active list so that it results in working set page eviction. >> >> Some app developer want to support POSIX_FADV_NOREUSE. >> But other OSes don't support it, either. >> (http://marc.info/?l=linux-mm&m=128928979512086&w=2) >> >> By other approach, app developers use POSIX_FADV_DONTNEED. >> But it has a problem. If kernel meets page is writing >> during invalidate_mapping_pages, it can't work. >> It is very hard for application programmer to use it. >> Because they always have to sync data before calling >> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could >> be discardable. At last, they can't use deferred write of kernel >> so that they could see performance loss. >> (http://insights.oetiker.ch/linux/fadvise.html) >> >> In fact, invalidation is very big hint to reclaimer. >> It means we don't use the page any more. So let's move >> the writing page into inactive list's head. >> >> Why I need the page to head, Dirty/Writeback page would be flushed >> sooner or later. It can prevent writeout of pageout which is less >> effective than flusher's writeout. >> >> Originally, I reused lru_demote of Peter with some change so added >> his Signed-off-by. >> >> Reported-by: Ben Gamari <bgamari.foss@xxxxxxxxx> >> Signed-off-by: Minchan Kim <minchan.kim@xxxxxxxxx> >> Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> Acked-by: Rik van Riel <riel@xxxxxxxxxx> >> Cc: Wu Fengguang <fengguang.wu@xxxxxxxxx> >> Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> >> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> >> Cc: Nick Piggin <npiggin@xxxxxxxxx> >> Cc: Mel Gorman <mel@xxxxxxxxx> >> >> Adnrew. Before applying this series, please drop below two patches. >> mm-deactivate-invalidated-pages.patch >> mm-deactivate-invalidated-pages-fix.patch >> >> Changelog since v2: >> - mapped page leaves alone - suggested by Mel >> - pass part related PG_reclaim in next patch. >> >> Changelog since v1: >> - modify description >> - correct typo >> - add some comment >> --- >> include/linux/swap.h | 1 + >> mm/swap.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> mm/truncate.c | 16 +++++++-- >> 3 files changed, 93 insertions(+), 4 deletions(-) >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index eba53e7..84375e4 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h > > [...] > >> @@ -267,6 +270,63 @@ void add_page_to_unevictable_list(struct page *page) >> } >> >> /* >> + * This function is used by invalidate_mapping_pages. >> + * If the page can't be invalidated, this function moves the page >> + * into inative list's head. Because the VM expects the page would >> + * be writeout by flusher. The flusher's writeout is much effective >> + * than reclaimer's random writeout. > > The wording is a bit confusing, I find. It sounds a bit like the > flusher's chance is increased by moving it to the inactive list in the > first place, but the key is that it is moved to the head instead of, > what one would expect, the tail of the list. How about: > > If the page can not be invalidated, it is moved to the > inactive list to speed up its reclaim. It is moved to the > head of the list, rather than the tail, to give the flusher > threads some time to write it out, as this is much more > effective than the single-page writeout from reclaim. > Looks good to me. I will add your comment instead of my ugly comment. >> +static void __lru_deactivate(struct page *page, struct zone *zone) > > Do you insist on the underscores? :) Good point. __lru_deactivate is self-contained. It is valuable enough using other places. I will remove underscores. > >> +{ >> + int lru, file; >> + unsigned long vm_flags; >> + >> + if (!PageLRU(page) || !PageActive(page)) >> + return; >> + >> + /* Some processes are using the page */ >> + if (page_mapped(page)) >> + return; >> + >> + file = page_is_file_cache(page); >> + lru = page_lru_base_type(page); >> + del_page_from_lru_list(zone, page, lru + LRU_ACTIVE); >> + ClearPageActive(page); >> + ClearPageReferenced(page); >> + add_page_to_lru_list(zone, page, lru); >> + __count_vm_event(PGDEACTIVATE); >> + >> + update_page_reclaim_stat(zone, page, file, 0); >> +} >> + >> +/* >> + * This function must be called with preemption disable. > > Why is that? Unless I missed something, the only thing that needs > protection is the per-cpu pagevec reference the only user of this > function passes in. But this should be the caller's concern and is > not really a requirement of this function per-se, is it? Yes. It's unnecessary. I didn't consider enoughly. Will fix. > >> +static void __pagevec_lru_deactivate(struct pagevec *pvec) > > More underscores! Will fix. > >> +{ >> + int i; >> + struct zone *zone = NULL; >> + >> + for (i = 0; i < pagevec_count(pvec); i++) { >> + struct page *page = pvec->pages[i]; >> + struct zone *pagezone = page_zone(page); >> + >> + if (pagezone != zone) { >> + if (zone) >> + spin_unlock_irq(&zone->lru_lock); >> + zone = pagezone; >> + spin_lock_irq(&zone->lru_lock); >> + } >> + __lru_deactivate(page, zone); >> + } >> + if (zone) >> + spin_unlock_irq(&zone->lru_lock); >> + >> + release_pages(pvec->pages, pvec->nr, pvec->cold); >> + pagevec_reinit(pvec); >> +} >> + >> +/* >> * Drain pages out of the cpu's pagevecs. >> * Either "cpu" is the current CPU, and preemption has already been >> * disabled; or "cpu" is being hot-unplugged, and is already dead. >> @@ -292,6 +352,26 @@ static void drain_cpu_pagevecs(int cpu) >> pagevec_move_tail(pvec); >> local_irq_restore(flags); >> } >> + >> + pvec = &per_cpu(lru_deactivate_pvecs, cpu); >> + if (pagevec_count(pvec)) >> + __pagevec_lru_deactivate(pvec); >> +} >> + >> +/* >> + * Forcefully deactivate a page. >> + * This function is used for reclaiming the page ASAP when the page >> + * can't be invalidated by Dirty/Writeback. > > How about: > > /** > * lru_deactivate_page - forcefully deactivate a page > * @page: page to deactivate > * > * This function hints the VM that @page is a good reclaim candidate, > * for example if its invalidation fails due to the page being dirty > * or under writeback. > */ > >> +void lru_deactivate_page(struct page *page) > > I would love that lru_ prefix for most of the API in this file. In > fact, the file should probably be called lru.c. But for now, can you > keep the naming consistent and call it deactivate_page? No matter. I can change it. but deactivate_page will be asymmetric about that deactivate_page move active page into inactive forcefully(two step) while activate_page does one step activation. That's why I name it. > >> + if (likely(get_page_unless_zero(page))) { >> + struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); >> + >> + if (!pagevec_add(pvec, page)) >> + __pagevec_lru_deactivate(pvec); >> + put_cpu_var(lru_deactivate_pvecs); >> + } >> } >> >> void lru_add_drain(void) > > [...] > >> @@ -359,8 +360,15 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping, >> if (lock_failed) >> continue; >> >> - ret += invalidate_inode_page(page); >> - >> + ret = invalidate_inode_page(page); >> + /* >> + * If the page was dirty or under writeback we cannot >> + * invalidate it now. Move it to the head of the >> + * inactive LRU for using deferred writeback of flusher. > > This would also be less confusing if it would say > > Move it to the head of the inactive LRU (rather than the tail) > for using [...] > > But I am not sure that this detail is interesting at this point. It > would be more interesting to name the reasons for why the page is > moved to the inactive list in the first place: > > If the page is dirty or under writeback, we can not invalidate > it now. But we assume that attempted invalidation is a hint > that the page is no longer of interest and try to speed up its > reclaim. > Will fix. I hope listen you guys's opinions about [2/3], too. :) Thanks, Hannes. >> + */ >> + if (!ret) >> + lru_deactivate_page(page); >> + count += ret; >> unlock_page(page); >> if (next > end) >> break; > > Hannes > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href