Re: Repeated fork() causes SLAB to grow without bound

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 16, 2012 at 02:58:45PM -0400, Rik van Riel wrote:

> Oh dear.
> 
> Basically, what happens is that at fork time, a new
> "level" is created for the anon_vma hierarchy. This
> works great for normal forking daemons, since the
> parent process just keeps running, and forking off
> children.
> 
> Look at anon_vma_fork() in mm/rmap.c for the details.
> 
> Having each child become the new parent, and the
> previous parent exit, can result in an "infinite"
> stack of anon_vmas.
> 
> Now, the parent anon_vma we cannot get rid of,
> because that is where the anon_vma lock lives.
> 
> However, in your case you have many more anon_vma
> levels than you have processes!
> 
> I wonder if it may be possible to fix your bug
> by adding a refcount to the struct anon_vma,
> one count for each VMA that is directly attached
> to the anon_vma (ie. vma->anon_vma == anon_vma),
> and one for each page that points to the anon_vma.
> 
> If the reference count on an anon_vma reaches 0,
> we can skip that anon_vma in anon_vma_clone, and
> the child process should not get that anon_vma.
> 
> A scheme like that may be enough to avoid the trouble
> you are running into.
> 
> Does this sound realistic?

Based on your comments, I came up with the following patch.  It boots
and the anon_vma/anon_vma_chain SLAB usage is stable, but I don't know
if I've overlooked something.  I'm not a kernel hacker.


--- include/linux/rmap.h.ORIG	2011-08-05 04:59:21.000000000 +0000
+++ include/linux/rmap.h	2012-08-16 22:52:25.000000000 +0000
@@ -35,6 +35,7 @@ struct anon_vma {
 	 * anon_vma if they are the last user on release
 	 */
 	atomic_t refcount;
+	atomic_t pagecount;
 
 	/*
 	 * NOTE: the LSB of the head.next is set by
--- mm/rmap.c.ORIG	2011-08-05 04:59:21.000000000 +0000
+++ mm/rmap.c	2012-08-17 23:55:13.000000000 +0000
@@ -85,6 +85,7 @@ static inline struct anon_vma *anon_vma_
 static inline void anon_vma_free(struct anon_vma *anon_vma)
 {
 	VM_BUG_ON(atomic_read(&anon_vma->refcount));
+	VM_BUG_ON(atomic_read(&anon_vma->pagecount));
 
 	/*
 	 * Synchronize against page_lock_anon_vma() such that
@@ -176,6 +177,7 @@ int anon_vma_prepare(struct vm_area_stru
 		spin_lock(&mm->page_table_lock);
 		if (likely(!vma->anon_vma)) {
 			vma->anon_vma = anon_vma;
+			atomic_inc(&anon_vma->pagecount);
 			avc->anon_vma = anon_vma;
 			avc->vma = vma;
 			list_add(&avc->same_vma, &vma->anon_vma_chain);
@@ -262,7 +264,10 @@ int anon_vma_clone(struct vm_area_struct
 		}
 		anon_vma = pavc->anon_vma;
 		root = lock_anon_vma_root(root, anon_vma);
-		anon_vma_chain_link(dst, avc, anon_vma);
+		if (!atomic_read(&anon_vma->pagecount))
+			anon_vma_chain_free(avc);
+		else
+			anon_vma_chain_link(dst, avc, anon_vma);
 	}
 	unlock_anon_vma_root(root);
 	return 0;
@@ -314,6 +319,7 @@ int anon_vma_fork(struct vm_area_struct
 	get_anon_vma(anon_vma->root);
 	/* Mark this anon_vma as the one where our new (COWed) pages go. */
 	vma->anon_vma = anon_vma;
+	atomic_set(&anon_vma->pagecount, 1);
 	anon_vma_lock(anon_vma);
 	anon_vma_chain_link(vma, avc, anon_vma);
 	anon_vma_unlock(anon_vma);
@@ -341,6 +347,8 @@ void unlink_anon_vmas(struct vm_area_str
 
 		root = lock_anon_vma_root(root, anon_vma);
 		list_del(&avc->same_anon_vma);
+		if (vma->anon_vma == anon_vma)
+			atomic_dec(&anon_vma->pagecount);
 
 		/*
 		 * Leave empty anon_vmas on the list - we'll need
@@ -375,6 +383,7 @@ static void anon_vma_ctor(void *data)
 
 	mutex_init(&anon_vma->mutex);
 	atomic_set(&anon_vma->refcount, 0);
+	atomic_set(&anon_vma->pagecount, 0);
 	INIT_LIST_HEAD(&anon_vma->head);
 }
 
@@ -996,6 +1005,7 @@ static void __page_set_anon_rmap(struct
 	if (!exclusive)
 		anon_vma = anon_vma->root;
 
+	atomic_inc(&anon_vma->pagecount);
 	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
 	page->mapping = (struct address_space *) anon_vma;
 	page->index = linear_page_index(vma, address);
@@ -1142,6 +1152,11 @@ void page_remove_rmap(struct page *page)
 	if (unlikely(PageHuge(page)))
 		return;
 	if (PageAnon(page)) {
+		struct anon_vma *anon_vma;
+
+		anon_vma = page_anon_vma(page);
+		if (anon_vma)
+			atomic_dec(&anon_vma->pagecount);
 		mem_cgroup_uncharge_page(page);
 		if (!PageTransHuge(page))
 			__dec_zone_page_state(page, NR_ANON_PAGES);
@@ -1747,6 +1762,7 @@ static void __hugepage_set_anon_rmap(str
 	if (!exclusive)
 		anon_vma = anon_vma->root;
 
+	atomic_inc(&anon_vma->pagecount);
 	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
 	page->mapping = (struct address_space *) anon_vma;
 	page->index = linear_page_index(vma, address);

-- 
Daniel K. Forrest		Space Science and
dan.forrest@xxxxxxxxxxxxx	Engineering Center
(608) 890 - 0558		University of Wisconsin, Madison

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]