Re: [PATCH RFC 7/7] libfs: Re-arrange locking in offset_iterate_dir()

"Liam R. Howlett" <Liam.Howlett@xxxxxxxxxx> · Fri, 16 Feb 2024 11:33:18 -0500

* Jan Kara <jack@xxxxxxx> [240216 05:15]:
> On Thu 15-02-24 16:07:42, Liam R. Howlett wrote:
> > * Jan Kara <jack@xxxxxxx> [240215 12:16]:
> > > On Thu 15-02-24 12:00:08, Liam R. Howlett wrote:
> > > > * Jan Kara <jack@xxxxxxx> [240215 08:16]:
> > > > > On Tue 13-02-24 16:38:08, Chuck Lever wrote:
> > > > > > From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > > > > > 
> > > > > > Liam says that, unlike with xarray, once the RCU read lock is
> > > > > > released ma_state is not safe to re-use for the next mas_find() call.
> > > > > > But the RCU read lock has to be released on each loop iteration so
> > > > > > that dput() can be called safely.
> > > > > > 

...

> > 
> > > Then how do you imagine serializing the
> > > background operations like compaction? As much as I agree your argument is
> > > "theoretically clean", it seems a bit like a trap and there are definitely
> > > xarray users that are going to be broken by this (e.g.
> > > tag_pages_for_writeback())...
> > 
> > I'm not sure I follow the trap logic.  There are locks for the data
> > structure that need to be followed for reading (rcu) and writing
> > (spinlock for the maple tree).  If you don't correctly lock the data
> > structure then you really are setting yourself up for potential issues
> > in the future.
> > 
> > The limitations are outlined in the documentation as to how and when to
> > lock.  I'm not familiar with the xarray users, but it does check for
> > locking with lockdep, but the way this is written bypasses the lockdep
> > checking as the locks are taken and dropped without the proper scope.
> > 
> > If you feel like this is a trap, then maybe we need to figure out a new
> > plan to detect incorrect use?
> 
> OK, I was a bit imprecise. What I wanted to say is that this is a shift in
> the paradigm in the sense that previously, we mostly had (and still have)
> data structure APIs (lists, rb-trees, radix-tree, now xarray) that were
> guaranteeing that unless you call into the function to mutate the data
> structure it stays intact. Now maple trees are shifting more in a direction
> of black-box API where you cannot assume what happens inside. Which is fine
> but then we have e.g. these iterators which do not quite follow this
> black-box design and you have to remember subtle details like calling
> "mas_pause()" before unlocking which is IMHO error-prone. Ideally, users of
> the black-box API shouldn't be exposed to the details of the internal
> locking at all (but then the performance suffers so I understand why you do
> things this way). Second to this ideal variant would be if we could detect
> we unlocked the lock without calling xas_pause() and warn on that. Or maybe
> xas_unlock*() should be calling xas_pause() automagically and we'd have
> similar helpers for RCU to do the magic for you?
> 
...

Fair enough.  You can think of the radix-tree and xarray as black boxes
as well; not everyone knows what's going on in there.  The rbtree and
list are embedded in your own data structure and you have to do a lot of
work for your operations.

Right now, it is the way you have described; it's a data structure API.
It will work this way, but you lose the really neat and performant part
of the tree.  If we do this right, we can compact at the same time as
reading data.  We cannot do that when depending on the locks you use
today. 

You don't have to use the mas_pause() functions, there are less
error-prone methods such as mt_find() that Chuck used here.  If you want
to do really neat stuff though, you should be looking at the mas_*
interface.  And yes, we totally hand you enough rope to hang yourself.
I'm not sure we can have an advanced interface without doing this.

Using the correct calls will allow for us to smoothly transition to a
model where you don't depend on the data remaining in the same place in
the tree (or xarray).

> 
> > If you have other examples you think are unsafe then I can have a look
> > at them as well.
> 
> I'm currently not aware of any but I'll let you know if I find some.
> Missing xas/mas_pause() seems really easy.

What if we convert the rcu_read_lock() to a mas_read_lock() or
xas_read_lock() and we can check to ensure the state isn't being locked
without being in the 'parked' state (paused or otherwise)?

mas_read_lock(struct ma_state *mas) {
	assert(!mas_active(mas));
	rcu_read_lock();
}

Would that be a reasonable resolution to your concern?  Unfortunately,
what was done with the locking in this case would not be detected with
this change unless the rcu_read_lock() was replaced.  IOW, people could
still use the rcu_read_lock() and skip the detection.

Doing the same in the mas_unlock() doesn't make as much sense since that
may be called without the intent to reuse the state at all.  So we'd be
doing more work than necessary at the end of each loop or use.

Thanks,
Liam