On Thu, Apr 28, 2011 at 8:06 PM, Mel Gorman <mgorman@xxxxxxx> wrote: > On Wed, Apr 27, 2011 at 01:25:23AM +0900, Minchan Kim wrote: >> This patch defines new APIs to putback the page into previous position of LRU. >> The idea is simple. >> >> When we try to putback the page into lru list and if friends(prev, next) of the pages >> still is nearest neighbor, we can insert isolated page into prev's next instead of >> head of LRU list. So it keeps LRU history without losing the LRU information. >> >> Before : >>    LRU POV : H - P1 - P2 - P3 - P4 -T >> >> Isolate P3 : >>    LRU POV : H - P1 - P2 - P4 - T >> >> Putback P3 : >>    if (P2->next == P4) >>        putback(P3, P2); >>    So, >>    LRU POV : H - P1 - P2 - P3 - P4 -T >> >> For implement, we defines new structure pages_lru which remebers >> both lru friend pages of isolated one and handling functions. >> >> But this approach has a problem on contiguous pages. >> In this case, my idea can not work since friend pages are isolated, too. >> It means prev_page->next == next_page always is false and both pages are not >> LRU any more at that time. It's pointed out by Rik at LSF/MM summit. >> So for solving the problem, I can change the idea. >> I think we don't need both friend(prev, next) pages relation but >> just consider either prev or next page that it is still same LRU. >> Worset case in this approach, prev or next page is free and allocate new >> so it's in head of LRU and our isolated page is located on next of head. >> But it's almost same situation with current problem. So it doesn't make worse >> than now and it would be rare. But in this version, I implement based on idea >> discussed at LSF/MM. If my new idea makes sense, I will change it. >> >> Any comment? >> >> Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> >> Cc: Mel Gorman <mgorman@xxxxxxx> >> Cc: Rik van Riel <riel@xxxxxxxxxx> >> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> >> Signed-off-by: Minchan Kim <minchan.kim@xxxxxxxxx> >> --- >> Âinclude/linux/migrate.h Â|  Â2 + >> Âinclude/linux/mm_types.h |  Â7 ++++ >> Âinclude/linux/swap.h   |  Â4 ++- >> Âmm/compaction.c     Â|  Â3 +- >> Âmm/internal.h      Â|  Â2 + >> Âmm/memcontrol.c     Â|  Â2 +- >> Âmm/migrate.c       |  36 +++++++++++++++++++++ >> Âmm/swap.c        Â|  Â2 +- >> Âmm/vmscan.c       Â|  79 +++++++++++++++++++++++++++++++++++++++++++-- >> Â9 files changed, 129 insertions(+), 8 deletions(-) >> >> diff --git a/include/linux/migrate.h b/include/linux/migrate.h >> index e39aeec..3aa5ab6 100644 >> --- a/include/linux/migrate.h >> +++ b/include/linux/migrate.h >> @@ -9,6 +9,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **); >> Â#ifdef CONFIG_MIGRATION >> Â#define PAGE_MIGRATION 1 >> >> +extern void putback_pages_lru(struct list_head *l); >> Âextern void putback_lru_pages(struct list_head *l); >> Âextern int migrate_page(struct address_space *, >>            struct page *, struct page *); >> @@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping, >> Â#else >> Â#define PAGE_MIGRATION 0 >> >> +static inline void putback_pages_lru(struct list_head *l) {} >> Âstatic inline void putback_lru_pages(struct list_head *l) {} >> Âstatic inline int migrate_pages(struct list_head *l, new_page_t x, >>        unsigned long private, bool offlining, >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index ca01ab2..35e80fb 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -102,6 +102,13 @@ struct page { >> Â#endif >> Â}; >> >> +/* This structure is used for keeping LRU ordering of isolated page */ >> +struct pages_lru { >> +    Âstruct page *page;   Â/* isolated page */ >> +    Âstruct page *prev_page; /* previous page of isolate page as LRU order */ >> +    Âstruct page *next_page; /* next page of isolate page as LRU order */ >> +    Âstruct list_head lru; >> +}; >> Â/* > > So this thing has to be allocated from somewhere. We can't put it > on the stack as we're already in danger there so we must be using > kmalloc. In the reclaim paths, this should be avoided obviously. > For compaction, we might hurt the compaction success rates if pages > are pinned with control structures. It's something to be wary of. Actually, 32 page unit of migration in compaction should be not a big problem. Maybe, just one page is enough to keep pages_lru arrays but I admit dynamic allocation in reclaim patch and pinning the page in compaction path isn't good. So I have thought pagevec-like approach which is static allocated. But the approach should be assume the migration unit should be small like SWAP_CLUSTER_MAX It's true now but actually I don't want to depends on the size. But for free the size limitation, we have to allocate new page dynamically, Hmm, any ideas? Personally, I don't have any idea other than pagevec-like approach of 32 pages. > > At LSF/MM, I stated a preference for swapping the source and > destination pages in the LRU. This unfortunately means that the LRU I can't understand your point. swapping source and destination? Could you elaborate on it? > now contains a page in the process of being migrated to and the backout > paths for migration failure become a lot more complex. Reclaim should > be ok as it'll should fail to lock the page and recycle it in the list. > This avoids allocations but I freely admit that I'm not in the position > to implement such a thing right now :( > >>  * A region containing a mapping of a non-memory backed file under NOMMU >>  * conditions. ÂThese are held in a global tree and are pinned by the VMAs that >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index baef4ad..4ad0a0c 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h >> @@ -227,6 +227,8 @@ extern void rotate_reclaimable_page(struct page *page); >> Âextern void deactivate_page(struct page *page); >> Âextern void swap_setup(void); >> >> +extern void update_page_reclaim_stat(struct zone *zone, struct page *page, >> +                  Âint file, int rotated); >> Âextern void add_page_to_unevictable_list(struct page *page); >> >> Â/** >> @@ -260,7 +262,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem, >>                        struct zone *zone, >>                        unsigned long *nr_scanned); >> Âextern int __isolate_lru_page(struct page *page, int mode, int file, >> -               int not_dirty, int not_mapped); >> +       int not_dirty, int not_mapped, struct pages_lru *pages_lru); >> Âextern unsigned long shrink_all_memory(unsigned long nr_pages); >> Âextern int vm_swappiness; >> Âextern int remove_mapping(struct address_space *mapping, struct page *page); >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 653b02b..c453000 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -335,7 +335,8 @@ static unsigned long isolate_migratepages(struct zone *zone, >>        } >> >>        /* Try isolate the page */ >> -       if (__isolate_lru_page(page, ISOLATE_BOTH, 0, !cc->sync, 0) != 0) >> +       if (__isolate_lru_page(page, ISOLATE_BOTH, 0, >> +                   !cc->sync, 0, NULL) != 0) >>            continue; >> >>        VM_BUG_ON(PageTransCompound(page)); >> diff --git a/mm/internal.h b/mm/internal.h >> index d071d38..3c8182c 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -43,6 +43,8 @@ extern unsigned long highest_memmap_pfn; >>  * in mm/vmscan.c: >>  */ >> Âextern int isolate_lru_page(struct page *page); >> +extern bool keep_lru_order(struct pages_lru *pages_lru); >> +extern void putback_page_to_lru(struct page *page, struct list_head *head); >> Âextern void putback_lru_page(struct page *page); >> >> Â/* >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 471e7fd..92a9046 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -1193,7 +1193,7 @@ unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan, >>            continue; >> >>        scan++; >> -       ret = __isolate_lru_page(page, mode, file, 0, 0); >> +       ret = __isolate_lru_page(page, mode, file, 0, 0, NULL); >>        switch (ret) { >>        case 0: >>            list_move(&page->lru, dst); >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 819d233..9cfb63b 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -85,6 +85,42 @@ void putback_lru_pages(struct list_head *l) >> Â} >> >> Â/* >> + * This function is almost same iwth putback_lru_pages. >> + * The difference is that function receives struct pages_lru list >> + * and if possible, we add pages into original position of LRU >> + * instead of LRU's head. >> + */ >> +void putback_pages_lru(struct list_head *l) >> +{ >> +    Âstruct pages_lru *isolated_page; >> +    Âstruct pages_lru *isolated_page2; >> +    Âstruct page *page; >> + >> +    Âlist_for_each_entry_safe(isolated_page, isolated_page2, l, lru) { >> +        Âstruct zone *zone; >> +        Âpage = isolated_page->page; >> +        Âlist_del(&isolated_page->lru); >> + >> +        Âdec_zone_page_state(page, NR_ISOLATED_ANON + >> +                Âpage_is_file_cache(page)); >> + >> +        Âzone = page_zone(page); >> +        Âspin_lock_irq(&zone->lru_lock); >> +        Âif (keep_lru_order(isolated_page)) { >> +            Âputback_page_to_lru(page, &isolated_page->prev_page->lru); >> +            Âspin_unlock_irq(&zone->lru_lock); >> +        Â} >> +        Âelse { >> +            Âspin_unlock_irq(&zone->lru_lock); >> +            Âputback_lru_page(page); >> +        Â} >> + >> +        Âkfree(isolated_page); >> +    Â} >> +} > > I think we also need counters at least at discussion stage to see > how successful this is. Actually, I had some numbers but don't published it. (4core, 1G DRAM and THP always, big stress(qemu, compile, web browsing, eclipse and other programs fork, successful rate is about 50%) That's because I have another idea to solve contiguous PFN problem. Please, see the description and reply of Rik's question on this thread. I guess the approach could make a high successful rate if there isn't concurrent direct compaction processes. > > For example, early in the system there is a casual relationship > between the age of a page and its location in physical memory. The > buddy allocator gives pages back in PFN order where possible and > there is a loose relationship between when pages get allocated and > when they get reclaimed. As compaction is a linear scanner, there > is a likelihood (that is highly variable) that physically contiguous > pages will have similar positions in the LRU. They'll be isolated at > the same time meaning they also won't be put back in order. Indeed! but I hope my second idea would solve the problem. What do you think about it? > > This might cease to matter when the system is running for some time but > it's a concern. Yes. It depends on aging of workloads. I think it's not easy to get a meaningful number to justify all. :( -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href