Re: BUG: maple_tree: KCSAN: data-race in mas_topiary_replace / mtree_range_walk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Mirsad Todorovac <mirsad.todorovac@xxxxxxxxxxxx> [230923 03:27]:
> On 9/22/23 15:51, Liam R. Howlett wrote:

...

> > > [ 6413.367326] ==================================================================
> > > [ 6413.367349] BUG: KCSAN: data-race in mas_topiary_replace / mtree_range_walk
> > > 
> > > [ 6413.367375] write to 0xffff8883a0c5db00 of 8 bytes by task 5475 on cpu 24:
> > > [ 6413.367386] mas_topiary_replace (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:491 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:1701 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2590)
> > > [ 6413.367399] mas_spanning_rebalance.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2664 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2961)
> > > [ 6413.367413] mas_wr_spanning_store.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:3894)
> > > [ 6413.367428] mas_wr_store_entry.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4242)
> > > [ 6413.367442] mas_store_prealloc (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:256 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:5408)
> > > [ 6413.367455] vma_merge (/home/marvin/linux/kernel/torvalds2/mm/internal.h:1127 /home/marvin/linux/kernel/torvalds2/mm/mmap.c:1005)
> > > [ 6413.367466] mprotect_fixup (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:632)
> > > [ 6413.367480] do_mprotect_pkey (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:819)
> > > [ 6413.367494] __x64_sys_mprotect (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:837)
> > > [ 6413.367503] do_syscall_64 (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:50 /home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:80)
> > > [ 6413.367517] entry_SYSCALL_64_after_hwframe (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/entry_64.S:120)
> > > 
> > > [ 6413.367534] read to 0xffff8883a0c5db00 of 8 bytes by task 5558 on cpu 11:
> > > [ 6413.367545] mtree_range_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:539 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2789)
> > > [ 6413.367556] mas_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:251 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4844)
> > > [ 6413.367567] lock_vma_under_rcu (/home/marvin/linux/kernel/torvalds2/mm/memory.c:5436)
> > > [ 6413.367579] do_user_addr_fault (/home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1357)
> > > [ 6413.367593] exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/paravirt.h:695 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1513 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1561)
> > > [ 6413.367602] asm_exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/idtentry.h:570)
> > > 
> > > [ 6413.367617] value changed: 0xffff888341d43116 -> 0xffff888340f92116
> > > 
> > > [ 6413.367632] Reported by Kernel Concurrency Sanitizer on:
> > > [ 6413.367640] CPU: 11 PID: 5558 Comm: ThreadPoolForeg Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41
> > > [ 6413.367653] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
> > > [ 6413.367660] ==================================================================
> > > 
> > > For your convenience, took the trouble of finding those suspicious lines of code:
> > > 
> > > The write side:
> > > 
> > > lib/maple_tree.c:491
> > > --------------------
> > >    456 /*
> > >    457  * mas_set_parent() - Set the parent node and encode the slot
> > >    458  * @enode: The encoded maple node.
> > >    459  * @parent: The encoded maple node that is the parent of @enode.
> > >    460  * @slot: The slot that @enode resides in @parent.
> > >    461  *
> > >    462  * Slot number is encoded in the enode->parent bit 3-6 or 2-6, depending on the
> > >    463  * parent type.
> > >    464  */
> > >    465 static inline
> > >    466 void mas_set_parent(struct ma_state *mas, struct maple_enode *enode,
> > >    467                     const struct maple_enode *parent, unsigned char slot)
> > >    468 {
> > >    469         unsigned long val = (unsigned long)parent;
> > >    470         unsigned long shift;
> > >    471         unsigned long type;
> > >    472         enum maple_type p_type = mte_node_type(parent);
> > >    473
> > >    474         MAS_BUG_ON(mas, p_type == maple_dense);
> > >    475         MAS_BUG_ON(mas, p_type == maple_leaf_64);
> > >    476
> > >    477         switch (p_type) {
> > >    478         case maple_range_64:
> > >    479         case maple_arange_64:
> > >    480                 shift = MAPLE_PARENT_SLOT_SHIFT;
> > >    481                 type = MAPLE_PARENT_RANGE64;
> > >    482                 break;
> > >    483         default:
> > >    484         case maple_dense:
> > >    485         case maple_leaf_64:
> > >    486                 shift = type = 0;
> > >    487                 break;
> > >    488         }
> > >    489
> > >    490         val &= ~MAPLE_NODE_MASK; /* Clear all node metadata in parent */
> > > → 491         val |= (slot << shift) | type;
> > >    492         mte_to_node(enode)->parent = ma_parent_ptr(val);
> > >    493 }

This should probably be 492, not 491.  I know what is racing here and it
shouldn't cause issue.

...
> > > The read side:
> > > 
> > >     527 /*
> > >     528  * ma_dead_node() - check if the @enode is dead.
> > >     529  * @enode: The encoded maple node
> > >     530  *
> > >     531  * Return: true if dead, false otherwise.
> > >     532  */
> > >     533 static inline bool ma_dead_node(const struct maple_node *node)
> > >     534 {
> > >     535         struct maple_node *parent;
> > >     536
> > >     537         /* Do not reorder reads from the node prior to the parent check */
> > >     538         smp_rmb();
> > > →  539         parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK);
> > >     540         return (parent == node);
> > >     541 }

This is the correct line.

...
> > > 
> > > as above, but the smb_rmb() protection is here, so the KCSAN warning should be double-checked for
> > > validity.
> > > 
> > > But I do not really understand maple trees to their depth, I am only reporting the symptomatic
> > > outlook of the reported data-race.
> > 
> > This is the most complex part of the maple tree, and unique to the
> > ability to store a range over multiple existing entries.  I recently
> > rewrote this particular section to avoid a potential live-lock issue.
> 
> I see.
> 
> > I am confident that the parent pointer will not be set to the node
> > pointer as checked by ma_dead_node().  But in an abundance of caution
> > and to resolve this potential false-positive, I am looking at this in
> > more detail.
> > 
> > This race is to see if the walk down the tree into unchanged data will
> > be seen before it is placed in the new subtree with its data also
> > unchanged.  Since we know the parent can never be the node, I feel this
> > is safe - but I'm not willing to live with the complaints from kasan.
> 
> I cannot analyse a couple of thousand lines at such a short notice.

Don't worry, I will :)

> 
> It looks suspicious because val in line 491 in a local variable and thus
> isn't available elsewhere.

It is used in the node->parent, as described above.  It is a race, but
it doesn't matter who wins.

> 
> Maybe the compiler (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0) did something
> to the code?

Probably off-by-one line.

> 
> > > This is all-in-all a very interested subject, and I wish there was a way to just slurp all those
> > > interesting kernel intrinsics into the brain, but it just ain't that easy. Forgive me for being
> > > open ...
> > 
> > I appreciate that and your detailed analysis with code pointers of where
> > this happens.  Is this easy to recreate?  If so, how?  Can you attach
> > your kernel config?
> 
> Got that attached first, so I do not forget. :-/
> 
> Recreate? Actually it happened quite a number of times on my configuration (480+?).

I'm having issues recreating it because I hit a lot of races before this
one in my test setup.  I will keep working on reproducing this race, but
in the mean time can you test the attached patch with your setup?

...

Thanks,
Liam
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index ee1ff0c59fd7..8f68582451c4 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1699,6 +1699,7 @@ static inline void mas_adopt_children(struct ma_state *mas,
 	do {
 		child = mas_slot_locked(mas, slots, offset);
 		mas_set_parent(mas, child, parent, offset);
+		smp_wmb(); /* Needed for RCU */
 	} while (offset--);
 }
 

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux