Re: [RFC]vmscan: doing page_referenced() in batch way

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/06/2010 04:10 PM, Andrew Morton wrote:
On Wed, 29 Sep 2010 10:57:33 +0800
Shaohua Li<shaohua.li@xxxxxxxxx>  wrote:

when memory pressure is high, page_referenced() causes a lot of lock contention
for anon_vma->lock or mapping->i_mmap_lock. Considering pages from one file
usually live side by side in LRU list, we can lock several pages in
shrink_page_list() and do batch page_referenced() to avoid some lock/unlock,
which should reduce lock contention a lot. The locking rule documented in
rmap.c is:
page_lock
	mapping->i_mmap_lock
		anon_vma->lock
For a batch of pages, we do page lock for all of them first and check their
reference, and then release their i_mmap_lock or anon_vma lock. This seems not
break the rule to me.
Before I further polish the patch, I'd like to know if there is anything
preventing us to do such batch here.

The patch adds quite a bit of complexity, so we'd need to see benchmark
testing results which justify it, please.

Also, the entire patch is irrelevant for uniprocessor machines, so the
runtime overhead and code-size increases for CONFIG_SMP=n builds should
be as low as possible - ideally zero.  Please quantify this as well
within the changelog if you pursue this work.


...

+#define PRC_PAGE_NUM 8
+struct page_reference_control {
+	int num;
+	struct page *pages[PRC_PAGE_NUM];
+	int references[PRC_PAGE_NUM];
+	struct anon_vma *anon_vma;
+	struct address_space *mapping;
+	/* no ksm */
+};

hm, 120 bytes of stack consumed, deep in page reclaim.

  #endif

  extern int hwpoison_filter(struct page *p);

...

  static int page_referenced_file(struct page *page,
  				struct mem_cgroup *mem_cont,
-				unsigned long *vm_flags)
+				unsigned long *vm_flags,
+				struct page_reference_control *prc)
  {
  	unsigned int mapcount;
  	struct address_space *mapping = page->mapping;
@@ -603,8 +623,25 @@ static int page_referenced_file(struct p
  	 */
  	BUG_ON(!PageLocked(page));

-	spin_lock(&mapping->i_mmap_lock);
+	if (prc) {
+		if (mapping == prc->mapping) {
+			goto skip_lock;
+		}
+		if (prc->anon_vma) {
+			page_unlock_anon_vma(prc->anon_vma);
+			prc->anon_vma = NULL;
+		}
+		if (prc->mapping) {
+			spin_unlock(&prc->mapping->i_mmap_lock);
+			prc->mapping = NULL;
+		}
+		prc->mapping = mapping;
+
+		spin_lock(&mapping->i_mmap_lock);
+	} else
+		spin_lock(&mapping->i_mmap_lock);

Move the spin_lock() outside, remove the `else' part.

+skip_lock:
  	/*
  	 * i_mmap_lock does not stabilize mapcount at all, but mapcount
  	 * is more likely to be accurate if we note it after spinning.
@@ -628,7 +665,8 @@ static int page_referenced_file(struct p
  			break;
  	}

-	spin_unlock(&mapping->i_mmap_lock);
+	if (!prc)
+		spin_unlock(&mapping->i_mmap_lock);
  	return referenced;
  }


...

+static void do_prc_batch(struct scan_control *sc,
+	struct page_reference_control *prc)
+{
+	int i;
+	for (i = 0; i<  prc->num; i++)
+		prc->references[i] = page_check_references(prc->pages[i], sc,
+			prc);
+	/*
+	 * we must release all locks here, the lock ordering requries
+	 * pagelock->
+	 *   mapping->i_mmap_lock->
+	 *     anon_vma->lock
+	 * release lock guarantee we don't break the rule in next run
+	 */
+	if (prc->anon_vma) {
+		page_unlock_anon_vma(prc->anon_vma);
+		prc->anon_vma = NULL;
+	}
+	if (prc->mapping) {
+		spin_unlock(&prc->mapping->i_mmap_lock);
+		prc->mapping = NULL;
+	}
+}

I didn't check the locking alterations.

I've tried to wrap my head around them, but haven't
yet been able to convince myself that it is safe.

What if we have multiple pages sharing the same
mapping or root anon_vma, which is not the same
as the prc->anon_vma ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]