Re: mm, ksm: NULL ptr deref in unstable_tree_search_insert

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 19 Dec 2012, Petr Holasek wrote:
> On Tue, 18 Dec 2012, Hugh Dickins wrote:
> > On Tue, 18 Dec 2012, Sasha Levin wrote:
> > 
> > > Hi all,
> > > 
> > > While fuzzing with trinity inside a KVM tools guest, running latest linux-next kernel, I've
> > > stumbled on the following:
> > > 
> > > [  127.959264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
> > > [  127.960379] IP: [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> ...
> 
> > > 88 e9 b9 09 00 00 90 <48> 81 3b 60 59 22 86 b8 01 00 00 00 44 0f 44 e8 41 83 fc 01 77
> > > [  127.978032] RIP  [<ffffffff81185b60>] __lock_acquire+0xb0/0xa90
> > > [  127.978032]  RSP <ffff8800137abb78>
> > > [  127.978032] CR2: 0000000000000110
> > > [  127.978032] ---[ end trace 3dc1b0c5db8c1230 ]---
> > > 
> > > The relevant piece of code is:
> > > 
> > > 	static struct page *get_mergeable_page(struct rmap_item *rmap_item)
> > > 	{
> > > 	        struct mm_struct *mm = rmap_item->mm;
> > > 	        unsigned long addr = rmap_item->address;
> > > 	        struct vm_area_struct *vma;
> > > 	        struct page *page;
> > > 	
> > > 	        down_read(&mm->mmap_sem);
> > > 
> > > Where 'mm' is NULL. I'm not really sure how it happens though.
> > 
> > Thanks, yes, I got that, and it's not peculiar to fuzzing at all:
> > I'm testing the fix at the moment, but just hit something else too
> > (ksmd oops on NULL p->mm in task_numa_fault i.e. task_numa_placement).
> > 
> > For the moment, you're safer not to run KSM: configure it out or don't
> > set it to run.  Fixes to follow later, I'll try to remember to Cc you.

Sadly I couldn't send out fixes yesterday, because my testing then
met another more subtle problem, that I've still not yet resolved.

> > 
> 
> Hello all,
> 
> I've also tried fuzzing with trinity inside of kvm guest when tested KSM
> patch, but applied on top of 3.7-rc8, but didn't trigger that oops. So
> going to do the same testing on linux-next.

I haven't tried linux-next myself, it being in a transitional state
until 3.8-rc1.  I've been testing on Linus's git from last weekend
(head at a4f1de176614f634c367e5994a7bcc428c940df0 to be precise),
plus your patch, plus my fixes.

> 
> Hugh, does it seem like bug in unstable_tree_search_insert() you mentioned
> in yesterday email of something else?

Yes, Sasha's is exactly the one I got earlier: hitting in
get_mergeable_page() due to bug in unstable_tree_search_insert().

>From your next mail, I see you've started looking into it; so I'd
better show what I have at the moment, although I do hate posting
incomplete known-buggy patches: this on top of git + your patch.

The kernel/sched/fair.c part to go to Mel and 3.8-rc1 when I've a
moment to separate it out.

The CONFIG_NUMA section in stable_tree_append() I added late
last night, but it's not enough to fix the issue I'm still seeing:
merge_across_nodes 0 and 1 cases are stable, but switching between
them can bring pages_shared down to 0 but leave something behind
in the stable_tree.  I suspect it's a wrong-nid issue, but I've
not yet tracked it down with the BUG_ONs I've been adding (not
included below).

Removing the KSM_RUN_MERGE test: unimportant, but so far as I could
see it should be unnecessary, and if it is necessary, then I think
we would need to check for another state too.

Hastily back to debugging,
Hugh

--- 3.7+git+petr/kernel/sched/fair.c	2012-12-16 16:35:08.724441527 -0800
+++ linux/kernel/sched/fair.c	2012-12-18 21:37:24.727964195 -0800
@@ -793,8 +793,11 @@ unsigned int sysctl_numa_balancing_scan_
 
 static void task_numa_placement(struct task_struct *p)
 {
-	int seq = ACCESS_ONCE(p->mm->numa_scan_seq);
+	int seq;
 
+	if (!p->mm)	/* for example, ksmd faulting in a user's mm */
+		return;
+	seq = ACCESS_ONCE(p->mm->numa_scan_seq);
 	if (p->numa_scan_seq == seq)
 		return;
 	p->numa_scan_seq = seq;
--- 3.7+git+petr/mm/ksm.c	2012-12-18 12:15:04.972032321 -0800
+++ linux/mm/ksm.c	2012-12-19 09:21:12.004004777 -0800
@@ -1151,7 +1151,6 @@ struct rmap_item *unstable_tree_search_i
 
 	nid = get_kpfn_nid(page_to_pfn(page));
 	root = &root_unstable_tree[nid];
-
 	new = &root->rb_node;
 
 	while (*new) {
@@ -1174,22 +1173,16 @@ struct rmap_item *unstable_tree_search_i
 		}
 
 		/*
-		 * When there isn't same page location, don't do anything.
-		 * If tree_page was migrated previously, it will be flushed
-		 * out and put into right unstable tree next time. If the
-		 * page was migrated in the meantime, it will be ignored
-		 * this round. When both pages were migrated to the same
-		 * node, ignore them too.
+		 * If tree_page has been migrated to another NUMA node, it
+		 * will be flushed out and put into the right unstable tree
+		 * next time: only merge with it if merge_across_nodes.
 		 * Just notice, we don't have similar problem for PageKsm
-		 * because their migration is disabled now. (62b61f611e) */
-
-#ifdef CONFIG_NUMA
-		if (page_to_nid(page) != page_to_nid(tree_page) ||
-			tree_rmap_item->nid != page_to_nid(tree_page)) {
+		 * because their migration is disabled now. (62b61f611e)
+		 */
+		if (!ksm_merge_across_nodes && page_to_nid(tree_page) != nid) {
 			put_page(tree_page);
 			return NULL;
 		}
-#endif
 
 		ret = memcmp_pages(page, tree_page);
 
@@ -1209,7 +1202,7 @@ struct rmap_item *unstable_tree_search_i
 	rmap_item->address |= UNSTABLE_FLAG;
 	rmap_item->address |= (ksm_scan.seqnr & SEQNR_MASK);
 #ifdef CONFIG_NUMA
-	rmap_item->nid = page_to_nid(page);
+	rmap_item->nid = nid;
 #endif
 	rb_link_node(&rmap_item->node, parent, new);
 	rb_insert_color(&rmap_item->node, root);
@@ -1226,6 +1219,13 @@ struct rmap_item *unstable_tree_search_i
 static void stable_tree_append(struct rmap_item *rmap_item,
 			       struct stable_node *stable_node)
 {
+#ifdef CONFIG_NUMA
+	/*
+	 * Usually rmap_item->nid is already set correctly,
+	 * but it may be wrong after switching merge_across_nodes.
+	 */
+	rmap_item->nid = get_kpfn_nid(stable_node->kpfn);
+#endif
 	rmap_item->head = stable_node;
 	rmap_item->address |= STABLE_FLAG;
 	hlist_add_head(&rmap_item->hlist, &stable_node->hlist);
@@ -1852,7 +1852,7 @@ static struct stable_node *ksm_check_sta
 	struct rb_node *node;
 	int nid;
 
-	for (nid = 0; nid < MAX_NUMNODES; nid++)
+	for (nid = 0; nid < nr_node_ids; nid++)
 		for (node = rb_first(&root_stable_tree[nid]); node;
 				node = rb_next(node)) {
 			struct stable_node *stable_node;
@@ -2030,22 +2030,15 @@ static ssize_t merge_across_nodes_store(
 		return -EINVAL;
 
 	mutex_lock(&ksm_thread_mutex);
-	if (ksm_run & KSM_RUN_MERGE) {
-		err = -EBUSY;
-	} else {
-		if (ksm_merge_across_nodes != knob) {
-			if (ksm_pages_shared > 0)
-				err = -EBUSY;
-			else
-				ksm_merge_across_nodes = knob;
-		}
+	if (ksm_merge_across_nodes != knob) {
+		if (ksm_pages_shared)
+			err = -EBUSY;
+		else
+			ksm_merge_across_nodes = knob;
 	}
-
-	if (err)
-		count = err;
 	mutex_unlock(&ksm_thread_mutex);
 
-	return count;
+	return err ? err : count;
 }
 KSM_ATTR(merge_across_nodes);
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]