Re: [PATCH v2 1/3] deactivate invalidated pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Wu,

On Mon, Nov 29, 2010 at 4:49 PM, Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> On Sun, Nov 28, 2010 at 11:02:55PM +0800, Minchan Kim wrote:
>> This patch is based on mmotm-11-23.
>
> I cannot find __pagevec_lru_deactive() in mmotm-11-23.
> Do you have any more patches?

Please see this patch.
http://www.spinics.net/lists/mm-commits/msg80851.html

>
>> Recently, there are reported problem about thrashing.
>> (http://marc.info/?l=rsync&m=128885034930933&w=2)
>> It happens by backup workloads(ex, nightly rsync).
>> That's because the workload makes just use-once pages
>> and touches pages twice. It promotes the page into
>> active list so that it results in working set page eviction.
>>
>> Some app developer want to support POSIX_FADV_NOREUSE.
>> But other OSes don't support it, either.
>> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
>>
>> By Other approach, app developer uses POSIX_FADV_DONTNEED.
>> But it has a problem. If kernel meets page is writing
>> during invalidate_mapping_pages, it can't work.
>> It is very hard for application programmer to use it.
>> Because they always have to sync data before calling
>> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
>> be discardable. At last, they can't use deferred write of kernel
>> so that they could see performance loss.
>> (http://insights.oetiker.ch/linux/fadvise.html)
>>
>> In fact, invalidation is very big hint to reclaimer.
>> It means we don't use the page any more. So let's move
>> the writing page into inactive list's head.
>>
>> Why I need the page to head, Dirty/Writeback page would be flushed
>> sooner or later. This patch uses trick PG_reclaim so the page would
>> be moved into tail of inactive list when the page writeout completes.
>>
>> It can prevent writeout of pageout which is less effective than
>> flusher's writeout.
>>
>> This patch considers page_mappged(page) with working set.
>> So the page could leave head of inactive to get a change to activate.
>>
>> Originally, I reused lru_demote of Peter with some change so added
>> his Signed-off-by.
>>
>> Note :
>> PG_reclaim trick of writeback page could race with end_page_writeback
>> so this patch check PageWriteback one more. It makes race window time
>> reall small. But by theoretical, it still have a race. But it's a trivial.
>>
>> Quote from fe3cba17 and some modification
>> "If some page PG_reclaim unintentionally, it will confuse readahead and
>> make it restart the size rampup process. But it's a trivial problem, and
>> can mostly be avoided by checking PageWriteback(page) first in readahead"
>>
>> PG_reclaim trick of dirty page don't work now since clear_page_dirty_for_io
>> always clears PG_reclaim. Next patch will fix it.
>>
>> Reported-by: Ben Gamari <bgamari.foss@xxxxxxxxx>
>> Signed-off-by: Minchan Kim <minchan.kim@xxxxxxxxx>
>> Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Wu Fengguang <fengguang.wu@xxxxxxxxx>
>> Cc: Rik van Riel <riel@xxxxxxxxxx>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
>> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
>> Cc: Nick Piggin <npiggin@xxxxxxxxx>
>> Cc: Mel Gorman <mel@xxxxxxxxx>
>>
>> Changelog since v1:
>>  - modify description
>>  - correct typo
>>  - add some comment
>>  - change deactivation policy
>> ---
>>  mm/swap.c |   84 +++++++++++++++++++++++++++++++++++++++++++++---------------
>>  1 files changed, 63 insertions(+), 21 deletions(-)
>>
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 31f5ec4..345eca1 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -268,10 +268,65 @@ void add_page_to_unevictable_list(struct page *page)
>>       spin_unlock_irq(&zone->lru_lock);
>>  }
>>
>> -static void __pagevec_lru_deactive(struct pagevec *pvec)
>> +/*
>> + * This function is used by invalidate_mapping_pages.
>> + * If the page can't be invalidated, this function moves the page
>> + * into inative list's head or tail to reclaim ASAP and evict
>> + * working set page.
>> + *
>> + * PG_reclaim means when the page's writeback completes, the page
>> + * will move into tail of inactive for reclaiming ASAP.
>> + *
>> + * 1. active, mapped page -> inactive, head
>> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
>> + * 3. inactive, mapped page -> none
>> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
>> + * 5. others -> none
>> + *
>> + * In 4, why it moves inactive's head, the VM expects the page would
>> + * be writeout by flusher. The flusher's writeout is much effective than
>> + * reclaimer's random writeout.
>> + */
>> +static void __lru_deactivate(struct page *page, struct zone *zone)
>>  {
>> -     int i, lru, file;
>> +     int lru, file;
>> +     int active = 0;
>> +
>> +     if (!PageLRU(page))
>> +             return;
>> +
>> +     if (PageActive(page))
>> +             active = 1;
>> +     /* Some processes are using the page */
>> +     if (page_mapped(page) && !active)
>> +             return;
>
> It's good to check such protections if doing heuristic demotion.
> However if requested explicitly by the user, I'm _much more_ inclined
> to act stupid&dumb and meet the user's expectation. Or will this code
> be called by someone other than DONTNEED? Sorry I have no context of
> the full code.

Sorry.

Yes. I expect lru_deactive_page can be used by other places with some
modification.
First thing I expected is here.

http://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg179576.html
After I make sure this patch's effective, I will try it, too.


>
>> +     else if (PageWriteback(page)) {
>> +             SetPageReclaim(page);
>> +             /* Check race with end_page_writeback */
>> +             if (!PageWriteback(page))
>> +                     ClearPageReclaim(page);
>
> Does the double check help a lot?
>
>> +     } else if (PageDirty(page))
>> +             SetPageReclaim(page);
>
> Typically there are much more dirty pages than writeback pages.
> I guess it's a good place to call bdi_start_inode_writeback() which
> was posted here: http://www.spinics.net/lists/linux-mm/msg10833.html

It looks good to me.
It makes my code very simple.

I can use it. It means my patch depends on yours patch.
Do you have a plan to merge it?


>
> Thanks,
> Fengguang
>
>> +
>> +     file = page_is_file_cache(page);
>> +     lru = page_lru_base_type(page);
>> +     del_page_from_lru_list(zone, page, lru + active);
>> +     ClearPageActive(page);
>> +     ClearPageReferenced(page);
>> +     add_page_to_lru_list(zone, page, lru);
>> +     if (active)
>> +             __count_vm_event(PGDEACTIVATE);
>> +
>> +     update_page_reclaim_stat(zone, page, file, 0);
>> +}
>>
>> +/*
>> + * This function must be called with preemption disable.
>> + */
>> +static void __pagevec_lru_deactivate(struct pagevec *pvec)
>> +{
>> +     int i;
>>       struct zone *zone = NULL;
>>
>>       for (i = 0; i < pagevec_count(pvec); i++) {
>> @@ -284,21 +339,7 @@ static void __pagevec_lru_deactive(struct pagevec *pvec)
>>                       zone = pagezone;
>>                       spin_lock_irq(&zone->lru_lock);
>>               }
>> -
>> -             if (PageLRU(page)) {
>> -                     if (PageActive(page)) {
>> -                             file = page_is_file_cache(page);
>> -                             lru = page_lru_base_type(page);
>> -                             del_page_from_lru_list(zone, page,
>> -                                             lru + LRU_ACTIVE);
>> -                             ClearPageActive(page);
>> -                             ClearPageReferenced(page);
>> -                             add_page_to_lru_list(zone, page, lru);
>> -                             __count_vm_event(PGDEACTIVATE);
>> -
>> -                             update_page_reclaim_stat(zone, page, file, 0);
>> -                     }
>> -             }
>> +             __lru_deactivate(page, zone);
>>       }
>>       if (zone)
>>               spin_unlock_irq(&zone->lru_lock);
>> @@ -336,11 +377,13 @@ static void drain_cpu_pagevecs(int cpu)
>>
>>       pvec = &per_cpu(lru_deactivate_pvecs, cpu);
>>       if (pagevec_count(pvec))
>> -             __pagevec_lru_deactive(pvec);
>> +             __pagevec_lru_deactivate(pvec);
>>  }
>>
>>  /*
>> - * Forecfully demote a page to the tail of the inactive list.
>> + * Forcefully deactivate a page.
>> + * This function is used for reclaiming the page ASAP when the page
>> + * can't be invalidated by Dirty/Writeback.
>>   */
>>  void lru_deactivate_page(struct page *page)
>>  {
>> @@ -348,12 +391,11 @@ void lru_deactivate_page(struct page *page)
>>               struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
>>
>>               if (!pagevec_add(pvec, page))
>> -                     __pagevec_lru_deactive(pvec);
>> +                     __pagevec_lru_deactivate(pvec);
>>               put_cpu_var(lru_deactivate_pvecs);
>>       }
>>  }
>>
>> -
>>  void lru_add_drain(void)
>>  {
>>       drain_cpu_pagevecs(get_cpu());
>> --
>> 1.7.0.4
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]