On Wed, Dec 11, 2024 at 10:02:12AM +0100, Peter Zijlstra wrote: > On Wed, Dec 11, 2024 at 09:13:16AM +0100, Christian Brauner wrote: > > Hey Peter, > > > > I had a question for you and meant to Cc you but forgot. This makes one > > of our rbtree uses lockless using the seqlock pattern. See below. > > > > I saw that in 50a38035ed5c ("rbtree: provide rb_find_rcu() / > > rb_find_add_rcu()") you added new _rcu() variants. > > The original patches are much older, see: > > d72da4a4d973 ("rbtree: Make lockless searches non-fatal") > ade3f510f93a ("rbtree: Implement generic latch_tree") > > (aw gawd, almost 10 years) > > > We're using another search function that allows us to walk the tree in > > either direction: > > > > guard(read_lock)(&mnt_ns_tree_lock); > > for (;;) { > > struct rb_node *node; > > > > if (previous) > > node = rb_prev(&mntns->mnt_ns_tree_node); > > else > > node = rb_next(&mntns->mnt_ns_tree_node); > > if (!node) > > return ERR_PTR(-ENOENT); > > > > mntns = node_to_mnt_ns(node); > > node = &mntns->mnt_ns_tree_node; > > > > But afaict neither rb_prev() nor rb_next() are rcu safe. Have you ever > > considered adding rcu safe variants for those two as well? > > Urgh, those are hard :-( > > So back when I did the lockless lookups, I only ensured the child > pointers were 'stable'. I did not deal with the parent (and colour) > pointer. > > The next/prev iterators very much rely on the parent pointer. > > Someone would have to go through the tree rotations again and see if it > is possible to also update the parent pointers in such a way as to not > create temporary loops. > > Notably, the thing you want to avoid is an interrupt doing a tree > traversal on the CPU that's doing the tree rotation getting stuck. > > The other, possibly even harder option would be to (finally) implement > threaded RB trees, where the current NULL child pointers become a 'list' > pointer to the next (in-order) element. But that too makes rotations > 'interesting' and must avoid creating loops. Ah, "fun". > > But in all those cases you have to also consider the ramifications of > rb_next/prev hitting a modification, I suspect like with the simple > lookup, you can miss entire subtrees. So depending on the requirements, I think it's fine as long was have the ability to detect that the tree was modified and retry - like in the simple lookup case. I was happy to see that you guarantee that only false negatives are possible. Last I looked at this some time ago Paul was much more pessimistic about walking the tree with seqlock and rcu. It's good that this now work afaik. > this might not be suitable for you. > > The far easier option might be to simply add a list_head along with the > rb_node and iterate that -- normally the problem with this is that you > can't easily move elements around in an RCU-list, but luck will have it > you don't move elements around. Your tree location is very static Good idea. Yeah, we don't move anything around. And our insertions are done with a key that's unique - the 64bit mount id - for the system lifetime. > afaict. >