Re: [RFC v3] Support volatile range for anon vma

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 11, 2012 at 11:41:04AM +0900, Minchan Kim wrote:
>Sorry, resending with fixing compile error. :(
>
>>From 0cfd3b65e4e90ab59abe8a337334414f92423cad Mon Sep 17 00:00:00 2001
>From: Minchan Kim <minchan@xxxxxxxxxx>
>Date: Tue, 11 Dec 2012 11:38:30 +0900
>Subject: [RFC v3] Support volatile range for anon vma
>
>This still is [RFC v3] because just passed my simple test
>with TCMalloc tweaking.
>
>I hope more inputs from user-space allocator people and test patch
>with their allocator because it might need design change of arena
>management design for getting real vaule.
>
>Changelog from v2
>
> * Removing madvise(addr, length, MADV_NOVOLATILE).
> * add vmstat about the number of discarded volatile pages
> * discard volatile pages without promotion in reclaim path
>
>This is based on v3.6.
>
>- What's the madvise(addr, length, MADV_VOLATILE)?
>
>  It's a hint that user deliver to kernel so kernel can *discard*
>  pages in a range anytime.
>
>- What happens if user access page(ie, virtual address) discarded
>  by kernel?
>
>  The user can see zero-fill-on-demand pages as if madvise(DONTNEED).
>
>- What happens if user access page(ie, virtual address) doesn't
>  discarded by kernel?
>
>  The user can see old data without page fault.
>
>- What's different with madvise(DONTNEED)?
>
>  System call semantic
>
>  DONTNEED makes sure user always can see zero-fill pages after
>  he calls madvise while VOLATILE can see zero-fill pages or
>  old data.
>
>  Internal implementation
>
>  The madvise(DONTNEED) should zap all mapped pages in range so
>  overhead is increased linearly with the number of mapped pages.
>  Even, if user access zapped pages by write, page fault + page
>  allocation + memset should be happened.
>
>  The madvise(VOLATILE) should mark the flag in a range(ie, VMA).
>  It doesn't touch pages any more so overhead of the system call
>  should be very small. If memory pressure happens, VM can discard
>  pages in VMAs marked by VOLATILE. If user access address with
>  write mode by discarding by VM, he can see zero-fill pages so the
>  cost is same with DONTNEED but if memory pressure isn't severe,
>  user can see old data without (page fault + page allocation + memset)
>
>  The VOLATILE mark should be removed in page fault handler when first
>  page fault occur in marked vma so next page faults will follow normal
>  page fault path. That's why user don't need madvise(MADV_NOVOLATILE)
>  interface.
>
>- What's the benefit compared to DONTNEED?
>
>  1. The system call overhead is smaller because VOLATILE just marks
>     the flag to VMA instead of zapping all the page in a range.
>
>  2. It has a chance to eliminate overheads (ex, page fault +
>     page allocation + memset(PAGE_SIZE)).
>
>- Isn't there any drawback?
>
>  DONTNEED doesn't need exclusive mmap_sem locking so concurrent page
>  fault of other threads could be allowed. But VOLATILE needs exclusive
>  mmap_sem so other thread would be blocked if they try to access
>  not-mapped pages. That's why I designed madvise(VOLATILE)'s overhead
>  should be small as far as possible.
>
>  Other concern of exclusive mmap_sem is when page fault occur in
>  VOLATILE marked vma. We should remove the flag of vma and merge
>  adjacent vmas so needs exclusive mmap_sem. It can slow down page fault
>  handling and prevent concurrent page fault. But we need such handling
>  just once when page fault occur after we mark VOLATILE into VMA
>  only if memory pressure happpens so the page is discarded. So it wouldn't
>  not common so that benefit we get by this feature would be bigger than
>  lose.
>
>- What's for targetting?
>
>  Firstly, user-space allocator like ptmalloc, tcmalloc or heap management
>  of virtual machine like Dalvik. Also, it comes in handy for embedded
>  which doesn't have swap device so they can't reclaim anonymous pages.
>  By discarding instead of swap, it could be used in the non-swap system.
>  For it,  we have to age anon lru list although we don't have swap because
>  I don't want to discard volatile pages by top priority when memory pressure
>  happens as volatile in this patch means "We don't need to swap out because
>  user can handle the situation which data are disappear suddenly", NOT
>  "They are useless so hurry up to reclaim them". So I want to apply same
>  aging rule of nomal pages to them.
>
>  Anonymous page background aging of non-swap system would be a trade-off
>  for getting good feature. Even, we had done it two years ago until merge
>  [1] and I believe gain of this patch will beat loss of anon lru aging's
>  overead once all of allocator start to use madvise.
>  (This patch doesn't include background aging in case of non-swap system
>  but it's trivial if we decide)
>
>[1] 74e3f3c3, vmscan: prevent background aging of anon page in no swap system
>
>Cc: Michael Kerrisk <mtk.manpages@xxxxxxxxx>
>Cc: Arun Sharma <asharma@xxxxxx>
>Cc: sanjay@xxxxxxxxxx
>Cc: Paul Turner <pjt@xxxxxxxxxx>
>CC: David Rientjes <rientjes@xxxxxxxxxx>
>Cc: John Stultz <john.stultz@xxxxxxxxxx>
>Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>Cc: Christoph Lameter <cl@xxxxxxxxx>
>Cc: Android Kernel Team <kernel-team@xxxxxxxxxxx>
>Cc: Robert Love <rlove@xxxxxxxxxx>
>Cc: Mel Gorman <mel@xxxxxxxxx>
>Cc: Hugh Dickins <hughd@xxxxxxxxxx>
>Cc: Dave Hansen <dave@xxxxxxxxxxxxxxxxxx>
>Cc: Rik van Riel <riel@xxxxxxxxxx>
>Cc: Dave Chinner <david@xxxxxxxxxxxxx>
>Cc: Neil Brown <neilb@xxxxxxx>
>Cc: Mike Hommey <mh@xxxxxxxxxxxx>
>Cc: Taras Glek <tglek@xxxxxxxxxxx>
>Cc: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxx>
>Cc: Christoph Lameter <cl@xxxxxxxxx>
>Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
>Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
>---
> arch/x86/mm/fault.c               |    2 +
> include/asm-generic/mman-common.h |    6 ++
> include/linux/mm.h                |    7 ++-
> include/linux/rmap.h              |   20 ++++++
> include/linux/vm_event_item.h     |    2 +-
> mm/madvise.c                      |   19 +++++-
> mm/memory.c                       |   32 ++++++++++
> mm/migrate.c                      |    6 +-
> mm/rmap.c                         |  125 ++++++++++++++++++++++++++++++++++++-
> mm/vmscan.c                       |    7 +++
> mm/vmstat.c                       |    1 +
> 11 files changed, 218 insertions(+), 9 deletions(-)
>
>diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>index 76dcd9d..17c1c20 100644
>--- a/arch/x86/mm/fault.c
>+++ b/arch/x86/mm/fault.c
>@@ -879,6 +879,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
> 		}
>
> 		out_of_memory(regs, error_code, address);
>+	} else if (fault & VM_FAULT_SIGSEG) {
>+			bad_area(regs, error_code, address);
> 	} else {
> 		if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|
> 			     VM_FAULT_HWPOISON_LARGE))
>diff --git a/include/asm-generic/mman-common.h b/include/asm-generic/mman-common.h
>index d030d2c..f07781e 100644
>--- a/include/asm-generic/mman-common.h
>+++ b/include/asm-generic/mman-common.h
>@@ -34,6 +34,12 @@
> #define MADV_SEQUENTIAL	2		/* expect sequential page references */
> #define MADV_WILLNEED	3		/* will need these pages */
> #define MADV_DONTNEED	4		/* don't need these pages */
>+/*
>+ * Unlike other flags, we need two locks to protect MADV_VOLATILE.
>+ * For changing the flag, we need mmap_sem's write lock and volatile_lock
>+ * while we just need volatile_lock in case of reading the flag.
>+ */
>+#define MADV_VOLATILE	5		/* pages will disappear suddenly */
>
> /* common parameters: try to keep these consistent across architectures */
> #define MADV_REMOVE	9		/* remove these pages & resources */
>diff --git a/include/linux/mm.h b/include/linux/mm.h
>index 311be90..89027b5 100644
>--- a/include/linux/mm.h
>+++ b/include/linux/mm.h
>@@ -119,6 +119,7 @@ extern unsigned int kobjsize(const void *objp);
> #define VM_SAO		0x20000000	/* Strong Access Ordering (powerpc) */
> #define VM_PFN_AT_MMAP	0x40000000	/* PFNMAP vma that is fully mapped at mmap time */
> #define VM_MERGEABLE	0x80000000	/* KSM may merge identical pages */
>+#define VM_VOLATILE	0x100000000	/* Pages in the vma could be discarable without swap */
>
> /* Bits set in the VMA until the stack is in its final location */
> #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
>@@ -143,7 +144,7 @@ extern unsigned int kobjsize(const void *objp);
>  * Special vmas that are non-mergable, non-mlock()able.
>  * Note: mm/huge_memory.c VM_NO_THP depends on this definition.
>  */
>-#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)
>+#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP | VM_VOLATILE)
>
> /*
>  * mapping from the currently active vm_flags protection bits (the
>@@ -872,11 +873,11 @@ static inline int page_mapped(struct page *page)
> #define VM_FAULT_NOPAGE	0x0100	/* ->fault installed the pte, not return page */
> #define VM_FAULT_LOCKED	0x0200	/* ->fault locked the returned page */
> #define VM_FAULT_RETRY	0x0400	/* ->fault blocked, must retry */
>-
>+#define VM_FAULT_SIGSEG	0x0800	/* -> There is no vma */
> #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */
>
> #define VM_FAULT_ERROR	(VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON | \
>-			 VM_FAULT_HWPOISON_LARGE)
>+			 VM_FAULT_HWPOISON_LARGE | VM_FAULT_SIGSEG)
>
> /* Encode hstate index for a hwpoisoned large page */
> #define VM_FAULT_SET_HINDEX(x) ((x) << 12)
>diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>index 3fce545..735d7a3 100644
>--- a/include/linux/rmap.h
>+++ b/include/linux/rmap.h
>@@ -67,6 +67,9 @@ struct anon_vma_chain {
> 	struct list_head same_anon_vma;	/* locked by anon_vma->mutex */
> };
>
>+void volatile_lock(struct vm_area_struct *vma);
>+void volatile_unlock(struct vm_area_struct *vma);
>+
> #ifdef CONFIG_MMU
> static inline void get_anon_vma(struct anon_vma *anon_vma)
> {
>@@ -170,6 +173,7 @@ enum ttu_flags {
> 	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
> 	TTU_IGNORE_ACCESS = (1 << 9),	/* don't age */
> 	TTU_IGNORE_HWPOISON = (1 << 10),/* corrupted page is recoverable */
>+	TTU_IGNORE_VOLATILE = (1 << 11),/* ignore volatile */
> };
> #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
>
>@@ -194,6 +198,21 @@ static inline pte_t *page_check_address(struct page *page, struct mm_struct *mm,
> 	return ptep;
> }
>
>+pte_t *__page_check_volatile_address(struct page *, struct mm_struct *,
>+                                unsigned long, spinlock_t **);
>+
>+static inline pte_t *page_check_volatile_address(struct page *page,
>+                                        struct mm_struct *mm,
>+                                        unsigned long address,
>+                                        spinlock_t **ptlp)
>+{
>+        pte_t *ptep;
>+
>+        __cond_lock(*ptlp, ptep = __page_check_volatile_address(page,
>+                                        mm, address, ptlp));
>+        return ptep;
>+}
>+
> /*
>  * Used by swapoff to help locate where page is expected in vma.
>  */
>@@ -257,5 +276,6 @@ static inline int page_mkclean(struct page *page)
> #define SWAP_AGAIN	1
> #define SWAP_FAIL	2
> #define SWAP_MLOCK	3
>+#define SWAP_DISCARD	4
>
> #endif	/* _LINUX_RMAP_H */
>diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
>index 57f7b10..3f9a40b 100644
>--- a/include/linux/vm_event_item.h
>+++ b/include/linux/vm_event_item.h
>@@ -23,7 +23,7 @@
>
> enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> 		FOR_ALL_ZONES(PGALLOC),
>-		PGFREE, PGACTIVATE, PGDEACTIVATE,
>+		PGFREE, PGVOLATILE, PGACTIVATE, PGDEACTIVATE,
> 		PGFAULT, PGMAJFAULT,
> 		FOR_ALL_ZONES(PGREFILL),
> 		FOR_ALL_ZONES(PGSTEAL_KSWAPD),
>diff --git a/mm/madvise.c b/mm/madvise.c
>index 14d260f..53a19d8 100644
>--- a/mm/madvise.c
>+++ b/mm/madvise.c
>@@ -86,6 +86,13 @@ static long madvise_behavior(struct vm_area_struct * vma,
> 		if (error)
> 			goto out;
> 		break;
>+	case MADV_VOLATILE:
>+		if (vma->vm_flags & VM_LOCKED) {
>+			error = -EINVAL;
>+			goto out;
>+		}
>+		new_flags |= VM_VOLATILE;
>+		break;
> 	}
>
> 	if (new_flags == vma->vm_flags) {
>@@ -118,9 +125,13 @@ static long madvise_behavior(struct vm_area_struct * vma,
> success:
> 	/*
> 	 * vm_flags is protected by the mmap_sem held in write mode.
>+	 * In caes of MADV_VOLATILE, we need anon_vma_lock additionally.
> 	 */
>+	if (behavior == MADV_VOLATILE)
>+		volatile_lock(vma);
> 	vma->vm_flags = new_flags;
>-
>+	if (behavior == MADV_VOLATILE)
>+		volatile_unlock(vma);
> out:
> 	if (error == -ENOMEM)
> 		error = -EAGAIN;
>@@ -310,6 +321,7 @@ madvise_behavior_valid(int behavior)
> #endif
> 	case MADV_DONTDUMP:
> 	case MADV_DODUMP:
>+	case MADV_VOLATILE:
> 		return 1;
>
> 	default:
>@@ -385,6 +397,11 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
> 		goto out;
> 	len = (len_in + ~PAGE_MASK) & PAGE_MASK;
>
>+	if (behavior != MADV_VOLATILE)
>+		len = (len_in + ~PAGE_MASK) & PAGE_MASK;
>+	else
>+		len = len_in & PAGE_MASK;
>+
> 	/* Check to see whether len was rounded up from small -ve to zero */
> 	if (len_in && !len)
> 		goto out;
>diff --git a/mm/memory.c b/mm/memory.c
>index 5736170..b5e4996 100644
>--- a/mm/memory.c
>+++ b/mm/memory.c
>@@ -57,6 +57,7 @@
> #include <linux/swapops.h>
> #include <linux/elf.h>
> #include <linux/gfp.h>
>+#include <linux/mempolicy.h>
>
> #include <asm/io.h>
> #include <asm/pgalloc.h>
>@@ -3446,6 +3447,37 @@ int handle_pte_fault(struct mm_struct *mm,
> 					return do_linear_fault(mm, vma, address,
> 						pte, pmd, flags, entry);
> 			}
>+			if (vma->vm_flags & VM_VOLATILE) {
>+				struct vm_area_struct *prev;
>+
>+				up_read(&mm->mmap_sem);
>+				down_write(&mm->mmap_sem);
>+				vma = find_vma_prev(mm, address, &prev);
>+
>+				/* Someone unmap the vma */
>+				if (unlikely(!vma) || vma->vm_start > address) {
>+					downgrade_write(&mm->mmap_sem);
>+					return VM_FAULT_SIGSEG;
>+				}
>+				/* Someone else already hanlded */
>+				if (vma->vm_flags & VM_VOLATILE) {
>+					/*
>+					 * From now on, we hold mmap_sem as
>+					 * exclusive.
>+					 */
>+					volatile_lock(vma);
>+					vma->vm_flags &= ~VM_VOLATILE;
>+					volatile_unlock(vma);
>+
>+					vma_merge(mm, prev, vma->vm_start,
>+						vma->vm_end, vma->vm_flags,
>+						vma->anon_vma, vma->vm_file,
>+						vma->vm_pgoff, vma_policy(vma));
>+
>+				}
>+
>+				downgrade_write(&mm->mmap_sem);
>+			}
> 			return do_anonymous_page(mm, vma, address,
> 						 pte, pmd, flags);
> 		}
>diff --git a/mm/migrate.c b/mm/migrate.c
>index 77ed2d7..08b009c 100644
>--- a/mm/migrate.c
>+++ b/mm/migrate.c
>@@ -800,7 +800,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> 	}
>
> 	/* Establish migration ptes or remove ptes */
>-	try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
>+	try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|
>+				TTU_IGNORE_ACCESS|TTU_IGNORE_VOLATILE);
>
> skip_unmap:
> 	if (!page_mapped(page))
>@@ -915,7 +916,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> 	if (PageAnon(hpage))
> 		anon_vma = page_get_anon_vma(hpage);
>
>-	try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
>+	try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|
>+				TTU_IGNORE_ACCESS|TTU_IGNORE_VOLATILE);
>
> 	if (!page_mapped(hpage))
> 		rc = move_to_new_page(new_hpage, hpage, 1, mode);
>diff --git a/mm/rmap.c b/mm/rmap.c
>index 0f3b7cd..1a0ab2b 100644
>--- a/mm/rmap.c
>+++ b/mm/rmap.c
>@@ -603,6 +603,57 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
> 	return vma_address(page, vma);
> }
>
>+pte_t *__page_check_volatile_address(struct page *page, struct mm_struct *mm,
>+		unsigned long address, spinlock_t **ptlp)
>+{
>+	pgd_t *pgd;
>+	pud_t *pud;
>+	pmd_t *pmd;
>+	pte_t *pte;
>+	spinlock_t *ptl;
>+
>+	swp_entry_t entry = { .val = page_private(page) };
>+
>+	if (unlikely(PageHuge(page))) {
>+		pte = huge_pte_offset(mm, address);
>+		ptl = &mm->page_table_lock;
>+		goto check;
>+	}
>+
>+	pgd = pgd_offset(mm, address);
>+	if (!pgd_present(*pgd))
>+		return NULL;
>+
>+	pud = pud_offset(pgd, address);
>+	if (!pud_present(*pud))
>+		return NULL;
>+
>+	pmd = pmd_offset(pud, address);
>+	if (!pmd_present(*pmd))
>+		return NULL;
>+	if (pmd_trans_huge(*pmd))
>+		return NULL;
>+
>+	pte = pte_offset_map(pmd, address);
>+	ptl = pte_lockptr(mm, pmd);
>+check:
>+	spin_lock(ptl);
>+	if (PageAnon(page)) {
>+		if (!pte_present(*pte) && entry.val ==
>+				pte_to_swp_entry(*pte).val) {
>+			*ptlp = ptl;
>+			return pte;
>+		}
>+	} else {
>+		if (pte_none(*pte)) {
>+			*ptlp = ptl;
>+			return pte;
>+		}
>+	}
>+	pte_unmap_unlock(pte, ptl);
>+	return NULL;
>+}
>+
> /*
>  * Check that @page is mapped at @address into @mm.
>  *
>@@ -1218,6 +1269,35 @@ out:
> 		mem_cgroup_end_update_page_stat(page, &locked, &flags);
> }
>
>+int try_to_zap_one(struct page *page, struct vm_area_struct *vma,
>+                unsigned long address)
>+{
>+        struct mm_struct *mm = vma->vm_mm;
>+        pte_t *pte;
>+        pte_t pteval;
>+        spinlock_t *ptl;
>+
>+        pte = page_check_volatile_address(page, mm, address, &ptl);
>+        if (!pte)
>+                return 0;
>+
>+        /* Nuke the page table entry. */
>+        flush_cache_page(vma, address, page_to_pfn(page));
>+        pteval = ptep_clear_flush(vma, address, pte);
>+
>+        if (PageAnon(page)) {
>+                swp_entry_t entry = { .val = page_private(page) };
>+                if (PageSwapCache(page)) {
>+                        dec_mm_counter(mm, MM_SWAPENTS);
>+                        swap_free(entry);
>+                }
>+        }
>+
>+        pte_unmap_unlock(pte, ptl);
>+        mmu_notifier_invalidate_page(mm, address);
>+        return 1;
>+}
>+
> /*
>  * Subfunctions of try_to_unmap: try_to_unmap_one called
>  * repeatedly from try_to_unmap_ksm, try_to_unmap_anon or try_to_unmap_file.
>@@ -1494,6 +1574,10 @@ static int try_to_unmap_anon(struct page *page, enum ttu_flags flags)
> 	struct anon_vma *anon_vma;
> 	struct anon_vma_chain *avc;
> 	int ret = SWAP_AGAIN;
>+	bool is_volatile = true;
>+
>+	if (flags & TTU_IGNORE_VOLATILE)
>+		is_volatile = false;
>
> 	anon_vma = page_lock_anon_vma(page);
> 	if (!anon_vma)
>@@ -1512,17 +1596,40 @@ static int try_to_unmap_anon(struct page *page, enum ttu_flags flags)
> 		 * temporary VMAs until after exec() completes.
> 		 */
> 		if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>-				is_vma_temporary_stack(vma))
>+				is_vma_temporary_stack(vma)) {
>+			is_volatile = false;
> 			continue;
>+		}
>
> 		address = vma_address(page, vma);
> 		if (address == -EFAULT)
> 			continue;
>+                /*
>+                 * A volatile page will only be purged if ALL vmas
>+		 * pointing to it are VM_VOLATILE.
>+                 */
>+                if (!(vma->vm_flags & VM_VOLATILE))
>+                        is_volatile = false;
>+
> 		ret = try_to_unmap_one(page, vma, address, flags);
> 		if (ret != SWAP_AGAIN || !page_mapped(page))
> 			break;
> 	}
>
>+        if (page_mapped(page) || is_volatile == false)
>+                goto out;
>+
>+        list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
>+                struct vm_area_struct *vma = avc->vma;
>+                unsigned long address;
>+
>+                address = vma_address(page, vma);
>+                try_to_zap_one(page, vma, address);
>+        }
>+        /* We're throwing this page out, so mark it clean */
>+        ClearPageDirty(page);
>+        ret = SWAP_DISCARD;
>+out:
> 	page_unlock_anon_vma(anon_vma);
> 	return ret;
> }
>@@ -1651,6 +1758,7 @@ out:
>  * SWAP_AGAIN	- we missed a mapping, try again later
>  * SWAP_FAIL	- the page is unswappable
>  * SWAP_MLOCK	- page is mlocked.
>+ * SWAP_DISCARD - page is volatile.
>  */
> int try_to_unmap(struct page *page, enum ttu_flags flags)
> {
>@@ -1665,7 +1773,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> 		ret = try_to_unmap_anon(page, flags);
> 	else
> 		ret = try_to_unmap_file(page, flags);
>-	if (ret != SWAP_MLOCK && !page_mapped(page))
>+	if (ret != SWAP_MLOCK && !page_mapped(page) &&
>+					ret != SWAP_DISCARD)
> 		ret = SWAP_SUCCESS;
> 	return ret;
> }
>@@ -1707,6 +1816,18 @@ void __put_anon_vma(struct anon_vma *anon_vma)
> 	anon_vma_free(anon_vma);
> }
>
>+void volatile_lock(struct vm_area_struct *vma)
>+{
>+        if (vma->anon_vma)
>+                anon_vma_lock(vma->anon_vma);
>+}
>+
>+void volatile_unlock(struct vm_area_struct *vma)
>+{
>+        if (vma->anon_vma)
>+                anon_vma_unlock(vma->anon_vma);
>+}
>+
> #ifdef CONFIG_MIGRATION
> /*
>  * rmap_walk() and its helpers rmap_walk_anon() and rmap_walk_file():
>diff --git a/mm/vmscan.c b/mm/vmscan.c
>index 99b434b..4e463a4 100644
>--- a/mm/vmscan.c
>+++ b/mm/vmscan.c
>@@ -630,6 +630,9 @@ static enum page_references page_check_references(struct page *page,
> 	if (vm_flags & VM_LOCKED)
> 		return PAGEREF_RECLAIM;
>
>+	if (vm_flags & VM_VOLATILE)
>+		return PAGEREF_RECLAIM;
>+
> 	if (referenced_ptes) {
> 		if (PageSwapBacked(page))
> 			return PAGEREF_ACTIVATE;
>@@ -789,6 +792,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> 		 */

Hi Minchan,

IIUC, anonymous page has already add to swapcache through add_to_swap called
by shrink_page_list, but I can't figure out where you remove it from swapache.

Regards,
Wanpeng Li 

> 		if (page_mapped(page) && mapping) {
> 			switch (try_to_unmap(page, TTU_UNMAP)) {
>+			case SWAP_DISCARD:
>+				count_vm_event(PGVOLATILE);
>+				goto discard_page;
> 			case SWAP_FAIL:
> 				goto activate_locked;
> 			case SWAP_AGAIN:
>@@ -857,6 +863,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> 			}
> 		}
>
>+discard_page:
> 		/*
> 		 * If the page has buffers, try to free the buffer mappings
> 		 * associated with this page. If we succeed we try to free
>diff --git a/mm/vmstat.c b/mm/vmstat.c
>index df7a674..410caf5 100644
>--- a/mm/vmstat.c
>+++ b/mm/vmstat.c
>@@ -734,6 +734,7 @@ const char * const vmstat_text[] = {
> 	TEXTS_FOR_ZONES("pgalloc")
>
> 	"pgfree",
>+	"pgvolatile",
> 	"pgactivate",
> 	"pgdeactivate",
>
>-- 
>1.7.9.5
>
>-- 
>Kind regards,
>Minchan Kim
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]