On Thu 15-02-24 16:07:42, Liam R. Howlett wrote: > * Jan Kara <jack@xxxxxxx> [240215 12:16]: > > On Thu 15-02-24 12:00:08, Liam R. Howlett wrote: > > > * Jan Kara <jack@xxxxxxx> [240215 08:16]: > > > > On Tue 13-02-24 16:38:08, Chuck Lever wrote: > > > > > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > > > > > > > Liam says that, unlike with xarray, once the RCU read lock is > > > > > released ma_state is not safe to re-use for the next mas_find() call. > > > > > But the RCU read lock has to be released on each loop iteration so > > > > > that dput() can be called safely. > > > > > > > > > > Thus we are forced to walk the offset tree with fresh state for each > > > > > directory entry. mt_find() can do this for us, though it might be a > > > > > little less efficient than maintaining ma_state locally. > > > > > > > > > > Since offset_iterate_dir() doesn't build ma_state locally any more, > > > > > there's no longer a strong need for offset_find_next(). Clean up by > > > > > rolling these two helpers together. > > > > > > > > > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > > > > > Well, in general I think even xas_next_entry() is not safe to use how > > > > offset_find_next() was using it. Once you drop rcu_read_lock(), > > > > xas->xa_node could go stale. But since you're holding inode->i_rwsem when > > > > using offset_find_next() you should be protected from concurrent > > > > modifications of the mapping (whatever the underlying data structure is) - > > > > that's what makes xas_next_entry() safe AFAIU. Isn't that enough for the > > > > maple tree? Am I missing something? > > > > > > If you are stopping, you should be pausing the iteration. Although this > > > works today, it's not how it should be used because if we make changes > > > (ie: compaction requires movement of data), then you may end up with a > > > UAF issue. We'd have no way of knowing you are depending on the tree > > > structure to remain consistent. > > > > I see. But we have versions of these structures that have locking external > > to the structure itself, don't we? > > Ah, I do have them - but I don't want to propagate its use as the dream > is that it can be removed. > > > > Then how do you imagine serializing the > > background operations like compaction? As much as I agree your argument is > > "theoretically clean", it seems a bit like a trap and there are definitely > > xarray users that are going to be broken by this (e.g. > > tag_pages_for_writeback())... > > I'm not sure I follow the trap logic. There are locks for the data > structure that need to be followed for reading (rcu) and writing > (spinlock for the maple tree). If you don't correctly lock the data > structure then you really are setting yourself up for potential issues > in the future. > > The limitations are outlined in the documentation as to how and when to > lock. I'm not familiar with the xarray users, but it does check for > locking with lockdep, but the way this is written bypasses the lockdep > checking as the locks are taken and dropped without the proper scope. > > If you feel like this is a trap, then maybe we need to figure out a new > plan to detect incorrect use? OK, I was a bit imprecise. What I wanted to say is that this is a shift in the paradigm in the sense that previously, we mostly had (and still have) data structure APIs (lists, rb-trees, radix-tree, now xarray) that were guaranteeing that unless you call into the function to mutate the data structure it stays intact. Now maple trees are shifting more in a direction of black-box API where you cannot assume what happens inside. Which is fine but then we have e.g. these iterators which do not quite follow this black-box design and you have to remember subtle details like calling "mas_pause()" before unlocking which is IMHO error-prone. Ideally, users of the black-box API shouldn't be exposed to the details of the internal locking at all (but then the performance suffers so I understand why you do things this way). Second to this ideal variant would be if we could detect we unlocked the lock without calling xas_pause() and warn on that. Or maybe xas_unlock*() should be calling xas_pause() automagically and we'd have similar helpers for RCU to do the magic for you? > Looking through tag_pages_for_writeback(), it does what is necessary to > keep a safe state - before it unlocks it calls xas_pause(). We have the > same on maple tree; mas_pause(). This will restart the next operation > from the root of the tree (the root can also change), to ensure that it > is safe. OK, I've missed the xas_pause(). Thanks for correcting me. > If you have other examples you think are unsafe then I can have a look > at them as well. I'm currently not aware of any but I'll let you know if I find some. Missing xas/mas_pause() seems really easy. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR