On Fri, Jan 06, 2017 at 02:58:51PM -0500, Dave Jones wrote: > On Fri, Jan 06, 2017 at 11:59:41AM -0500, Johannes Weiner wrote: > > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > > index 6f382e07de77..0783af1c0ebb 100644 > > --- a/lib/radix-tree.c > > +++ b/lib/radix-tree.c > > @@ -640,6 +640,8 @@ static inline void radix_tree_shrink(struct radix_tree_root *root, > > update_node(node, private); > > } > > > > + WARN_ON_ONCE(!list_empty(&node->private_list)); > > + > > radix_tree_node_free(node); > > } > > } > > [ 8467.462878] WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200 > [ 8467.468770] CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3 > [ 8467.480436] Call Trace: > [ 8467.486213] dump_stack+0x4f/0x73 > [ 8467.491999] __warn+0xcb/0xf0 > [ 8467.497769] warn_slowpath_null+0x1d/0x20 > [ 8467.503566] delete_node+0x1e4/0x200 > [ 8467.509468] __radix_tree_delete_node+0xd/0x10 > [ 8467.515425] shadow_lru_isolate+0xe6/0x220 > [ 8467.521337] __list_lru_walk_one.isra.4+0x9b/0x190 > [ 8467.527176] ? memcg_drain_all_list_lrus+0x1d0/0x1d0 > [ 8467.533066] list_lru_walk_one+0x23/0x30 > [ 8467.538953] scan_shadow_nodes+0x2e/0x40 > [ 8467.544840] shrink_slab.part.44+0x23d/0x5d0 > [ 8467.550751] ? 0xffffffffa023a077 > [ 8467.556639] shrink_node+0x22c/0x330 > [ 8467.562542] kswapd+0x392/0x8f0 > [ 8467.568422] kthread+0x10f/0x150 > [ 8467.574313] ? mem_cgroup_shrink_node+0x2e0/0x2e0 > [ 8467.580266] ? kthread_create_on_node+0x60/0x60 > [ 8467.586203] ret_from_fork+0x29/0x40 > [ 8467.592109] ---[ end trace f790bafb683609d5 ]--- Argh, __radix_tree_delete_node() makes the flawed assumption that only the immediate branch it's mucking with can collapse. But this warning points out that a sibling branch can collapse too, including its leaf. Can you try if this patch fixes the problem? --- >From 8f06f9c18db63c66c4fe8047686901e025d36ff8 Mon Sep 17 00:00:00 2001 From: Johannes Weiner <hannes@xxxxxxxxxxx> Date: Fri, 6 Jan 2017 19:21:43 -0500 Subject: [PATCH] mm: workingset: fix cache tree sibling branch shrinking Several people report seeing warnings about inconsistent radix tree nodes followed by crashes in the workingset code, which all looked like use-after-free access from the shadow node shrinker. Dave Jones managed to reproduce the issue with a debug patch applied, which confirmed that the radix tree shrinking indeed frees shadow nodes while they are still linked to the shadow LRU: [ 8467.462878] WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200 [ 8467.468770] CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3 [ 8467.480436] Call Trace: [ 8467.486213] dump_stack+0x4f/0x73 [ 8467.491999] __warn+0xcb/0xf0 [ 8467.497769] warn_slowpath_null+0x1d/0x20 [ 8467.503566] delete_node+0x1e4/0x200 [ 8467.509468] __radix_tree_delete_node+0xd/0x10 [ 8467.515425] shadow_lru_isolate+0xe6/0x220 [ 8467.521337] __list_lru_walk_one.isra.4+0x9b/0x190 [ 8467.527176] ? memcg_drain_all_list_lrus+0x1d0/0x1d0 [ 8467.533066] list_lru_walk_one+0x23/0x30 [ 8467.538953] scan_shadow_nodes+0x2e/0x40 [ 8467.544840] shrink_slab.part.44+0x23d/0x5d0 [ 8467.550751] ? 0xffffffffa023a077 [ 8467.556639] shrink_node+0x22c/0x330 [ 8467.562542] kswapd+0x392/0x8f0 This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the inlined radix_tree_shrink(). The problem is with 14b468791fa9 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking"), which passes an update callback into the radix tree to link and unlink shadow leaf nodes when tree entries change, but forgot to pass the callback when reclaiming a shadow node. While the reclaimed shadow node itself is unlinked by the shrinker, its deletion from the tree can cause the left-most leaf node in the tree to be shrunk. If that happens to be a shadow node as well, we don't unlink it from the LRU as we should. Consider this tree, where the s are shadow entries: root->rnode | [0 n] | | [s ] [sssss] Now the shadow node shrinker reclaims the rightmost leaf node through the shadow node LRU: root->rnode | [0 ] | [s ] Because the parent of the deleted node is the first level below the root and has only one child in the left-most slot, the intermediate level is shrunk and the node containing the single shadow is put in its place: root->rnode | [s ] The shrinker again sees a single left-most slot in a first level node and thus decides to store the shadow in root->rnode directly and free the node - which is a leaf node on the shadow node LRU. root->rnode | s Without the update callback, the freed node remains on the shadow LRU, where it causes later shrinker runs to crash. Pass the node updater callback into __radix_tree_delete_node() in case the deletion causes the left-most branch in the tree to collapse too. Also add warnings when linked nodes are freed right away, rather than wait for the use-after-free when the list is scanned much later. Fixes: 14b468791fa9 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking") Reported-by: Dave Chinner <david@xxxxxxxxxxxxx> Reported-by: Hugh Dickins <hughd@xxxxxxxxxx> Reported-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> Reported-and-debugged-by: Dave Jones <davej@xxxxxxxxxxxxxxxxx> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> --- include/linux/radix-tree.h | 4 +++- lib/radix-tree.c | 11 +++++++++-- mm/workingset.c | 3 ++- 3 files changed, 14 insertions(+), 4 deletions(-) diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index 5dea8f6440e4..52bda854593b 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -306,7 +306,9 @@ void radix_tree_iter_replace(struct radix_tree_root *, void radix_tree_replace_slot(struct radix_tree_root *root, void **slot, void *item); void __radix_tree_delete_node(struct radix_tree_root *root, - struct radix_tree_node *node); + struct radix_tree_node *node, + radix_tree_update_node_t update_node, + void *private); void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *); void *radix_tree_delete(struct radix_tree_root *, unsigned long); void radix_tree_clear_tags(struct radix_tree_root *root, diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 6f382e07de77..0b92d605fb69 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -640,6 +640,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root, update_node(node, private); } + WARN_ON_ONCE(!list_empty(&node->private_list)); radix_tree_node_free(node); } } @@ -666,6 +667,7 @@ static void delete_node(struct radix_tree_root *root, root->rnode = NULL; } + WARN_ON_ONCE(!list_empty(&node->private_list)); radix_tree_node_free(node); node = parent; @@ -767,6 +769,7 @@ static void radix_tree_free_nodes(struct radix_tree_node *node) struct radix_tree_node *old = child; offset = child->offset + 1; child = child->parent; + WARN_ON_ONCE(!list_empty(&node->private_list)); radix_tree_node_free(old); if (old == entry_to_node(node)) return; @@ -1824,15 +1827,19 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot); * __radix_tree_delete_node - try to free node after clearing a slot * @root: radix tree root * @node: node containing @index + * @update_node: callback for changing leaf nodes + * @private: private data to pass to @update_node * * After clearing the slot at @index in @node from radix tree * rooted at @root, call this function to attempt freeing the * node and shrinking the tree. */ void __radix_tree_delete_node(struct radix_tree_root *root, - struct radix_tree_node *node) + struct radix_tree_node *node, + radix_tree_update_node_t update_node, + void *private) { - delete_node(root, node, NULL, NULL); + delete_node(root, node, update_node, private); } /** diff --git a/mm/workingset.c b/mm/workingset.c index 241fa5d6b3b2..abb58ffa3c64 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -473,7 +473,8 @@ static enum lru_status shadow_lru_isolate(struct list_head *item, if (WARN_ON_ONCE(node->exceptional)) goto out_invalid; inc_node_state(page_pgdat(virt_to_page(node)), WORKINGSET_NODERECLAIM); - __radix_tree_delete_node(&mapping->page_tree, node); + __radix_tree_delete_node(&mapping->page_tree, node, + workingset_update_node, mapping); out_invalid: spin_unlock(&mapping->tree_lock); -- 2.10.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>