Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 22 Mar 2024 12:09:11 +1100

On Thu, Mar 21, 2024 at 10:59:22AM +0100, Andre Noll wrote:
> On Thu, Mar 21, 09:51, Dave Chinner wrote
> > I just haven't thought to run sparse on XFS recently - running
> > sparse on a full kernel build is just .... awful. I think I'll
> > change my build script so that when I do an '--xfs-only' built it
> > also enables sparse as it's only rebuilding fs/xfs at that point....
> 
> Would it be less awful to run coccinelle with a selected set of
> semantic patches that catch defective patterns such as double
> unlock/free?

Much more awful - because then I have to write scripts to do this
checking rather than just add a command line parameter to the build.

> > > > (Doesn't simple lock debugging catch these sorts of things?)
> > > 
> > > Maybe this error path doesn't get exercised because xfs_reinit_inode()
> > > never fails. AFAICT, it can only fail if security_inode_alloc()
> > > can't allocate the composite inode blob.
> > 
> > Which syzkaller triggers every so often. I also do all my testing
> > with selinux enabled, so security_inode_alloc() is actually being
> > exercised and definitely has the potential to fail on my small
> > memory configs...
> 
> One could try to trigger ENOMEM more easily in functions like this
> by allocating bigger slab caches for debug builds.

That doesn't solve the problem - people keep trying to tell us that
all we need it "better testing" when the right solution to the
problem is for memory allocation to *never fail* unless the caller
says it is OK to fail. Better error injection and/or forced failures
don't actually help us all that much because of the massive scope of
the error checking that has to be done. Getting rid of the need for
error checking altogether is a much better long term solution to
this problem...

> > > > ((It sure would be nice if locking returned a droppable "object" to do
> > > > the unlock ala Rust and then spin_lock could be __must_check.))
> > > 
> > > There's the *LOCK_GUARD* macros which employ gcc's cleanup attribute
> > > to automatically call e.g. spin_unlock() when a variable goes out of
> > > scope (see 54da6a0924311).
> > 
> > IMO, the LOCK_GUARD stuff is an awful anti-pattern. It means some
> > error paths -look broken- because they lack unlocks, and we have to
> > explicitly change code to return from functions with the guarded
> > locks held. This is a diametrically opposed locking pattern to the
> > existing non-guarded lockign patterns - correct behaviour in one
> > pattern is broken behaviour in the other, and vice versa.
> > 
> > That's just -insane- from a code maintenance point of view.
> 
> Converting all locks in fs/xfs in one go is not an option either, as
> this would be too big to review, and non-trivial to begin with.

It's simply not possible because of the issues I mentioned, plus
others.

> There
> are 180+ calls to spin_lock(), and that's just the spinlocks. Also
> these patches would interfere badly with ongoing work.

ANywhere you have unbalanced lock contexts, non-trivial nested
locking, reverse order locking (via trylocks), children doing unlock
and lock to change lock contexts, etc then this "guarded lock scope"
does not work. XFS is -full- of these non-trivial locking
algorithms, so it's just not a good idea to even start trying to do
a conversion...

> > And they are completely useless for anythign complex like these
> > XFS icache functions because the lock scope is not balanced across
> > functions.
> >
> > The lock can also be taken by functions called within the guard
> > scope, and so using guarded lock scoping would result in deadlocks.
> > i.e. xfs_inodegc_queue() needs to take the i_flags_lock, so it must
> > be dropped before we call that.
> 
> Yup, these can't use the LOCK_GUARD macros, which leads to an unholy
> mix of guarded and unguarded locks.

Exactly my point.

> > So, yeah, lock guards seem to me to be largely just a "look ma, no
> > need for rust because we can mightily abuse the C preprocessor!"
> > anti-pattern looking for a problem to solve.
> 
> Do you think there is a valid use case for the cleanup attribute,
> or do you believe that the whole concept is mis-designed?

Sure, there's plenty of cases where scoped cleanup attributes really
does make the code better.  e.g. we had XFS changes that used this
attribute in a complex loop iterator rejected back before it became
accepted so that this lock guard template thingy could be
implemented with it.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx