Re: [PATCH 3/5] fs: lockless mntns rbtree lookup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 11, 2024 at 10:02:12AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 11, 2024 at 09:13:16AM +0100, Christian Brauner wrote:
> > Hey Peter,
> > 
> > I had a question for you and meant to Cc you but forgot. This makes one
> > of our rbtree uses lockless using the seqlock pattern. See below.
> > 
> > I saw that in 50a38035ed5c ("rbtree: provide rb_find_rcu() /
> > rb_find_add_rcu()") you added new _rcu() variants.
> 
> The original patches are much older, see:
> 
>   d72da4a4d973 ("rbtree: Make lockless searches non-fatal")
>   ade3f510f93a ("rbtree: Implement generic latch_tree")
> 
> (aw gawd, almost 10 years)
> 
> > We're using another search function that allows us to walk the tree in
> > either direction:
> > 
> >         guard(read_lock)(&mnt_ns_tree_lock);
> >         for (;;) {
> >                 struct rb_node *node;
> > 
> >                 if (previous)
> >                         node = rb_prev(&mntns->mnt_ns_tree_node);
> >                 else
> >                         node = rb_next(&mntns->mnt_ns_tree_node);
> >                 if (!node)
> >                         return ERR_PTR(-ENOENT);
> > 
> >                 mntns = node_to_mnt_ns(node);
> >                 node = &mntns->mnt_ns_tree_node;
> > 
> > But afaict neither rb_prev() nor rb_next() are rcu safe. Have you ever
> > considered adding rcu safe variants for those two as well?
> 
> Urgh, those are hard :-(
> 
> So back when I did the lockless lookups, I only ensured the child
> pointers were 'stable'. I did not deal with the parent (and colour)
> pointer.
> 
> The next/prev iterators very much rely on the parent pointer.
> 
> Someone would have to go through the tree rotations again and see if it
> is possible to also update the parent pointers in such a way as to not
> create temporary loops.
> 
> Notably, the thing you want to avoid is an interrupt doing a tree
> traversal on the CPU that's doing the tree rotation getting stuck.
> 
> The other, possibly even harder option would be to (finally) implement
> threaded RB trees, where the current NULL child pointers become a 'list'
> pointer to the next (in-order) element. But that too makes rotations
> 'interesting' and must avoid creating loops.

Ah, "fun".

> 
> But in all those cases you have to also consider the ramifications of
> rb_next/prev hitting a modification, I suspect like with the simple
> lookup, you can miss entire subtrees. So depending on the requirements,

I think it's fine as long was have the ability to detect that the tree
was modified and retry - like in the simple lookup case. I was happy to
see that you guarantee that only false negatives are possible. Last I
looked at this some time ago Paul was much more pessimistic about
walking the tree with seqlock and rcu. It's good that this now work
afaik.

> this might not be suitable for you.
> 
> The far easier option might be to simply add a list_head along with the
> rb_node and iterate that -- normally the problem with this is that you
> can't easily move elements around in an RCU-list, but luck will have it
> you don't move elements around. Your tree location is very static

Good idea. Yeah, we don't move anything around. And our insertions are
done with a key that's unique - the 64bit mount id - for the system
lifetime.

> afaict.
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux