Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 21 Mar 2024 09:51:40 +1100

On Wed, Mar 20, 2024 at 05:58:53PM +0100, Andre Noll wrote:
> On Wed, Mar 20, 07:53, Darrick J. Wong wrote
> > On Wed, Mar 20, 2024 at 09:39:57AM +0100, Andre Noll wrote:
> > > On Tue, Mar 19, 11:16, Dave Chinner wrote
> > > > +		/*
> > > > +		 * Well, that sucks. Put the inode back on the inactive queue.
> > > > +		 * Do this while still under the ILOCK so that we can set the
> > > > +		 * NEED_INACTIVE flag and clear the INACTIVATING flag an not
> > > > +		 * have another lookup race with us before we've finished
> > > > +		 * putting the inode back on the inodegc queue.
> > > > +		 */
> > > > +		spin_unlock(&ip->i_flags_lock);
> > > > +		ip->i_flags |= XFS_NEED_INACTIVE;
> > > > +		ip->i_flags &= ~XFS_INACTIVATING;
> > > > +		spin_unlock(&ip->i_flags_lock);
> > > 
> > > This doesn't look right. Shouldn't the first spin_unlock() be spin_lock()?
> > 
> > Yes.  So much for my hand inspection of code. :(
> 
> Given enough hand inspections, all bugs are shallow :)

Sparse should have found that, if I ran it. :/

Ah, but sparse gets confused by the fact that the return from the
function may or may not have unlocked stuff:

fs/xfs/xfs_icache.c:355:9: warning: context imbalance in 'xfs_iget_recycle' - unexpected unlock
fs/xfs/xfs_icache.c:414:28: warning: context imbalance in 'xfs_iget_reactivate' - unexpected unlock
fs/xfs/xfs_icache.c:656:28: warning: context imbalance in 'xfs_iget_cache_hit' - different lock contexts for basic block

So if I fix that (that'll be patch 5 for this series), i get:

  CC      fs/xfs/xfs_icache.o
  CHECK   fs/xfs/xfs_icache.c
fs/xfs/xfs_icache.c:459:28: warning: context imbalance in 'xfs_iget_reactivate' - unexpected unlock

Yup, sparse now catches the unbalanced locking.

I just haven't thought to run sparse on XFS recently - running
sparse on a full kernel build is just .... awful. I think I'll
change my build script so that when I do an '--xfs-only' built it
also enables sparse as it's only rebuilding fs/xfs at that point....

> > (Doesn't simple lock debugging catch these sorts of things?)
> 
> Maybe this error path doesn't get exercised because xfs_reinit_inode()
> never fails. AFAICT, it can only fail if security_inode_alloc()
> can't allocate the composite inode blob.

Which syzkaller triggers every so often. I also do all my testing
with selinux enabled, so security_inode_alloc() is actually being
exercised and definitely has the potential to fail on my small
memory configs...

> > ((It sure would be nice if locking returned a droppable "object" to do
> > the unlock ala Rust and then spin_lock could be __must_check.))
> 
> There's the *LOCK_GUARD* macros which employ gcc's cleanup attribute
> to automatically call e.g. spin_unlock() when a variable goes out of
> scope (see 54da6a0924311).

IMO, the LOCK_GUARD stuff is an awful anti-pattern. It means some
error paths -look broken- because they lack unlocks, and we have to
explicitly change code to return from functions with the guarded
locks held. This is a diametrically opposed locking pattern to the
existing non-guarded lockign patterns - correct behaviour in one
pattern is broken behaviour in the other, and vice versa.

That's just -insane- from a code maintenance point of view.

And they are completely useless for anythign complex like these
XFS icache functions because the lock scope is not balanced across
functions.

The lock can also be taken by functions called within the guard
scope, and so using guarded lock scoping would result in deadlocks.
i.e. xfs_inodegc_queue() needs to take the i_flags_lock, so it must
be dropped before we call that.

So, yeah, lock guards seem to me to be largely just a "look ma, no
need for rust because we can mightily abuse the C preprocessor!"
anti-pattern looking for a problem to solve.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx