+ radix-tree-test-suite-multi-order-iteration-race.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: radix tree test suite: multi-order iteration race
has been added to the -mm tree.  Its filename is
     radix-tree-test-suite-multi-order-iteration-race.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/radix-tree-test-suite-multi-order-iteration-race.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/radix-tree-test-suite-multi-order-iteration-race.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Subject: radix tree test suite: multi-order iteration race

Add a test which shows a race in the multi-order iteration code.  This test
reliably hits the race in under a second on my machine, and is the result
of a real bug report against kernel a production v4.15 based kernel
(4.15.6-300.fc27.x86_64).  With a real kernel this issue is hit when using
order 9 PMD DAX radix tree entries.

The race has to do with how we tear down multi-order sibling entries when
we are removing an item from the tree.  Remember that an order 2 entry
looks like this:

struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]

where 'entry' is in some slot in the struct radix_tree_node, and the three
slots following 'entry' contain sibling pointers which point back to
'entry.'

When we delete 'entry' from the tree, we call :
  radix_tree_delete()
    radix_tree_delete_item()
      __radix_tree_delete()
        replace_slot()

replace_slot() first removes the siblings in order from the first to the
last, then at then replaces 'entry' with NULL.  This means that for a brief
period of time we end up with one or more of the siblings removed, so:

struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]

This causes an issue if you have a reader iterating over the slots in the
tree via radix_tree_for_each_slot() while only under
rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
mm/filemap.c.

The issue is that when __radix_tree_next_slot() => skip_siblings() tries to
skip over the sibling entries in the slots, it currently does so with an
exact match on the slot directly preceding our current slot.  Normally this
works:
                                    V preceding slot
struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                            ^ current slot

This lets you find the first sibling, and you skip them all in order.

But in the case where one of the siblings is NULL, that slot is skipped and
then our sibling detection is interrupted:

                                           V preceding slot
struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                  ^ current slot

This means that the sibling pointers aren't recognized since they point all
the way back to 'entry', so we think that they are normal internal radix
tree pointers.  This causes us to think we need to walk down to a struct
radix_tree_node starting at the address of 'entry'.

In a real running kernel this will crash the thread with a GP fault when
you try and dereference the slots in your broken node starting at 'entry'.

In the radix tree test suite this will be caught by the address sanitizer:

==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0
READ of size 8 at 0x60c0008ae400 thread T3
    #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660
    #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567
    #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655
    #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a)
    #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e)

Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@xxxxxxxxxxxxxxx
Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: CR, Sapthagirish <sapthagirish.cr@xxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: Dave Chinner <david@xxxxxxxxxxxxx>
Cc: Jan Kara <jack@xxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 tools/testing/radix-tree/multiorder.c |   63 ++++++++++++++++++++++++
 tools/testing/radix-tree/test.h       |    1 
 2 files changed, 64 insertions(+)

diff -puN tools/testing/radix-tree/multiorder.c~radix-tree-test-suite-multi-order-iteration-race tools/testing/radix-tree/multiorder.c
--- a/tools/testing/radix-tree/multiorder.c~radix-tree-test-suite-multi-order-iteration-race
+++ a/tools/testing/radix-tree/multiorder.c
@@ -16,6 +16,7 @@
 #include <linux/radix-tree.h>
 #include <linux/slab.h>
 #include <linux/errno.h>
+#include <pthread.h>
 
 #include "test.h"
 
@@ -624,6 +625,67 @@ static void multiorder_account(void)
 	item_kill_tree(&tree);
 }
 
+bool stop_iteration = false;
+
+static void *creator_func(void *ptr)
+{
+	/* 'order' is set up to ensure we have sibling entries */
+	unsigned int order = RADIX_TREE_MAP_SHIFT - 1;
+	struct radix_tree_root *tree = ptr;
+	int i;
+
+	for (i = 0; i < 10000; i++) {
+		item_insert_order(tree, 0, order);
+		item_delete_rcu(tree, 0);
+	}
+
+	stop_iteration = true;
+	return NULL;
+}
+
+static void *iterator_func(void *ptr)
+{
+	struct radix_tree_root *tree = ptr;
+	struct radix_tree_iter iter;
+	struct item *item;
+	void **slot;
+
+	while (!stop_iteration) {
+		rcu_read_lock();
+		radix_tree_for_each_slot(slot, tree, &iter, 0) {
+			item = radix_tree_deref_slot(slot);
+
+			if (!item)
+				continue;
+			if (radix_tree_deref_retry(item)) {
+				slot = radix_tree_iter_retry(&iter);
+				continue;
+			}
+
+			item_sanity(item, iter.index);
+		}
+		rcu_read_unlock();
+	}
+	return NULL;
+}
+
+static void multiorder_iteration_race(void)
+{
+	const int num_threads = sysconf(_SC_NPROCESSORS_ONLN);
+	pthread_t worker_thread[num_threads];
+	RADIX_TREE(tree, GFP_KERNEL);
+	int i;
+
+	pthread_create(&worker_thread[0], NULL, &creator_func, &tree);
+	for (i = 1; i < num_threads; i++)
+		pthread_create(&worker_thread[i], NULL, &iterator_func, &tree);
+
+	for (i = 0; i < num_threads; i++)
+		pthread_join(worker_thread[i], NULL);
+
+	item_kill_tree(&tree);
+}
+
 void multiorder_checks(void)
 {
 	int i;
@@ -644,6 +706,7 @@ void multiorder_checks(void)
 	multiorder_join();
 	multiorder_split();
 	multiorder_account();
+	multiorder_iteration_race();
 
 	radix_tree_cpu_dead(0);
 }
diff -puN tools/testing/radix-tree/test.h~radix-tree-test-suite-multi-order-iteration-race tools/testing/radix-tree/test.h
--- a/tools/testing/radix-tree/test.h~radix-tree-test-suite-multi-order-iteration-race
+++ a/tools/testing/radix-tree/test.h
@@ -13,6 +13,7 @@ struct item {
 struct item *item_create(unsigned long index, unsigned int order);
 int __item_insert(struct radix_tree_root *root, struct item *item);
 int item_insert(struct radix_tree_root *root, unsigned long index);
+void item_sanity(struct item *item, unsigned long index);
 int item_insert_order(struct radix_tree_root *root, unsigned long index,
 			unsigned order);
 int item_delete(struct radix_tree_root *root, unsigned long index);
_

Patches currently in -mm which might be from ross.zwisler@xxxxxxxxxxxxxxx are

radix-tree-test-suite-fix-mapshift-build-target.patch
radix-tree-test-suite-fix-compilation-issue.patch
radix-tree-test-suite-add-item_delete_rcu.patch
radix-tree-test-suite-multi-order-iteration-race.patch
radix-tree-fix-multi-order-iteration-race.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux