* Mirsad Todorovac <mirsad.todorovac@xxxxxxxxxxxx> [230923 03:27]: > On 9/22/23 15:51, Liam R. Howlett wrote: ... > > > [ 6413.367326] ================================================================== > > > [ 6413.367349] BUG: KCSAN: data-race in mas_topiary_replace / mtree_range_walk > > > > > > [ 6413.367375] write to 0xffff8883a0c5db00 of 8 bytes by task 5475 on cpu 24: > > > [ 6413.367386] mas_topiary_replace (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:491 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:1701 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2590) > > > [ 6413.367399] mas_spanning_rebalance.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2664 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2961) > > > [ 6413.367413] mas_wr_spanning_store.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:3894) > > > [ 6413.367428] mas_wr_store_entry.isra.0 (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4242) > > > [ 6413.367442] mas_store_prealloc (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:256 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:5408) > > > [ 6413.367455] vma_merge (/home/marvin/linux/kernel/torvalds2/mm/internal.h:1127 /home/marvin/linux/kernel/torvalds2/mm/mmap.c:1005) > > > [ 6413.367466] mprotect_fixup (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:632) > > > [ 6413.367480] do_mprotect_pkey (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:819) > > > [ 6413.367494] __x64_sys_mprotect (/home/marvin/linux/kernel/torvalds2/mm/mprotect.c:837) > > > [ 6413.367503] do_syscall_64 (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:50 /home/marvin/linux/kernel/torvalds2/arch/x86/entry/common.c:80) > > > [ 6413.367517] entry_SYSCALL_64_after_hwframe (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/entry_64.S:120) > > > > > > [ 6413.367534] read to 0xffff8883a0c5db00 of 8 bytes by task 5558 on cpu 11: > > > [ 6413.367545] mtree_range_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:539 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:2789) > > > [ 6413.367556] mas_walk (/home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:251 /home/marvin/linux/kernel/torvalds2/lib/maple_tree.c:4844) > > > [ 6413.367567] lock_vma_under_rcu (/home/marvin/linux/kernel/torvalds2/mm/memory.c:5436) > > > [ 6413.367579] do_user_addr_fault (/home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1357) > > > [ 6413.367593] exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/paravirt.h:695 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1513 /home/marvin/linux/kernel/torvalds2/arch/x86/mm/fault.c:1561) > > > [ 6413.367602] asm_exc_page_fault (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/idtentry.h:570) > > > > > > [ 6413.367617] value changed: 0xffff888341d43116 -> 0xffff888340f92116 > > > > > > [ 6413.367632] Reported by Kernel Concurrency Sanitizer on: > > > [ 6413.367640] CPU: 11 PID: 5558 Comm: ThreadPoolForeg Tainted: G L 6.6.0-rc2-kcsan-00143-gb5cbe7c00aa0 #41 > > > [ 6413.367653] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023 > > > [ 6413.367660] ================================================================== > > > > > > For your convenience, took the trouble of finding those suspicious lines of code: > > > > > > The write side: > > > > > > lib/maple_tree.c:491 > > > -------------------- > > > 456 /* > > > 457 * mas_set_parent() - Set the parent node and encode the slot > > > 458 * @enode: The encoded maple node. > > > 459 * @parent: The encoded maple node that is the parent of @enode. > > > 460 * @slot: The slot that @enode resides in @parent. > > > 461 * > > > 462 * Slot number is encoded in the enode->parent bit 3-6 or 2-6, depending on the > > > 463 * parent type. > > > 464 */ > > > 465 static inline > > > 466 void mas_set_parent(struct ma_state *mas, struct maple_enode *enode, > > > 467 const struct maple_enode *parent, unsigned char slot) > > > 468 { > > > 469 unsigned long val = (unsigned long)parent; > > > 470 unsigned long shift; > > > 471 unsigned long type; > > > 472 enum maple_type p_type = mte_node_type(parent); > > > 473 > > > 474 MAS_BUG_ON(mas, p_type == maple_dense); > > > 475 MAS_BUG_ON(mas, p_type == maple_leaf_64); > > > 476 > > > 477 switch (p_type) { > > > 478 case maple_range_64: > > > 479 case maple_arange_64: > > > 480 shift = MAPLE_PARENT_SLOT_SHIFT; > > > 481 type = MAPLE_PARENT_RANGE64; > > > 482 break; > > > 483 default: > > > 484 case maple_dense: > > > 485 case maple_leaf_64: > > > 486 shift = type = 0; > > > 487 break; > > > 488 } > > > 489 > > > 490 val &= ~MAPLE_NODE_MASK; /* Clear all node metadata in parent */ > > > → 491 val |= (slot << shift) | type; > > > 492 mte_to_node(enode)->parent = ma_parent_ptr(val); > > > 493 } This should probably be 492, not 491. I know what is racing here and it shouldn't cause issue. ... > > > The read side: > > > > > > 527 /* > > > 528 * ma_dead_node() - check if the @enode is dead. > > > 529 * @enode: The encoded maple node > > > 530 * > > > 531 * Return: true if dead, false otherwise. > > > 532 */ > > > 533 static inline bool ma_dead_node(const struct maple_node *node) > > > 534 { > > > 535 struct maple_node *parent; > > > 536 > > > 537 /* Do not reorder reads from the node prior to the parent check */ > > > 538 smp_rmb(); > > > → 539 parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK); > > > 540 return (parent == node); > > > 541 } This is the correct line. ... > > > > > > as above, but the smb_rmb() protection is here, so the KCSAN warning should be double-checked for > > > validity. > > > > > > But I do not really understand maple trees to their depth, I am only reporting the symptomatic > > > outlook of the reported data-race. > > > > This is the most complex part of the maple tree, and unique to the > > ability to store a range over multiple existing entries. I recently > > rewrote this particular section to avoid a potential live-lock issue. > > I see. > > > I am confident that the parent pointer will not be set to the node > > pointer as checked by ma_dead_node(). But in an abundance of caution > > and to resolve this potential false-positive, I am looking at this in > > more detail. > > > > This race is to see if the walk down the tree into unchanged data will > > be seen before it is placed in the new subtree with its data also > > unchanged. Since we know the parent can never be the node, I feel this > > is safe - but I'm not willing to live with the complaints from kasan. > > I cannot analyse a couple of thousand lines at such a short notice. Don't worry, I will :) > > It looks suspicious because val in line 491 in a local variable and thus > isn't available elsewhere. It is used in the node->parent, as described above. It is a race, but it doesn't matter who wins. > > Maybe the compiler (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0) did something > to the code? Probably off-by-one line. > > > > This is all-in-all a very interested subject, and I wish there was a way to just slurp all those > > > interesting kernel intrinsics into the brain, but it just ain't that easy. Forgive me for being > > > open ... > > > > I appreciate that and your detailed analysis with code pointers of where > > this happens. Is this easy to recreate? If so, how? Can you attach > > your kernel config? > > Got that attached first, so I do not forget. :-/ > > Recreate? Actually it happened quite a number of times on my configuration (480+?). I'm having issues recreating it because I hit a lot of races before this one in my test setup. I will keep working on reproducing this race, but in the mean time can you test the attached patch with your setup? ... Thanks, Liam
diff --git a/lib/maple_tree.c b/lib/maple_tree.c index ee1ff0c59fd7..8f68582451c4 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -1699,6 +1699,7 @@ static inline void mas_adopt_children(struct ma_state *mas, do { child = mas_slot_locked(mas, slots, offset); mas_set_parent(mas, child, parent, offset); + smp_wmb(); /* Needed for RCU */ } while (offset--); }