Re: [PATCH 1/5] locking: Add rwsem_is_write_locked()

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 8 Sep 2023 10:44:13 +1000

On Thu, Sep 07, 2023 at 07:47:31PM -0400, Waiman Long wrote:
> 
> On 9/7/23 17:06, Waiman Long wrote:
> > 
> > On 9/7/23 15:33, Matthew Wilcox wrote:
> > > On Thu, Sep 07, 2023 at 02:05:54PM -0400, Waiman Long wrote:
> > > > On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote:
> > > > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> > > > > +{
> > > > > +    return atomic_long_read(&sem->count) & 1 /*
> > > > > RWSEM_WRITER_LOCKED */;
> > > > > +}
> > > > I would prefer you move the various RWSEM_* count bit macros from
> > > > kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use
> > > > RWSEM_WRITER_LOCKED instead of hardcoding a value of 1.
> > > Just to be clear, you want the ~50 lines from:
> > > 
> > > /*
> > >   * On 64-bit architectures, the bit definitions of the count are:
> > > ...
> > > #define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
> > > RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL)
> > > 
> > > moved from rwsem.c to rwsem.h?
> > > 
> > > Or just these four lines:
> > > 
> > > #define RWSEM_WRITER_LOCKED     (1UL << 0)
> > > #define RWSEM_FLAG_WAITERS      (1UL << 1)
> > > #define RWSEM_FLAG_HANDOFF      (1UL << 2)
> > > #define RWSEM_FLAG_READFAIL     (1UL << (BITS_PER_LONG - 1))
> > 
> > I think just the first 3 lines will be enough. Maybe a bit of comment
> > about these bit flags in the count atomic_long value.
> 
> Actually, the old rwsem implementation won't allow you to reliably determine
> if a rwsem is write locked because the xadd instruction is used for write
> locking and the code had to back out the WRITER_BIAS if the attempt failed.
> Maybe that is why XFS has its own code to check if a rwsem is write locked
> which is needed with the old rwsem implementation.

mrlocks pre-date rwsems entirely on Linux.  mrlocks were introduced
to XFS as part of the port from Irix back in 2000. This originally
had a 'ismrlocked()' function for checking lock state.

In 2003, this was expanded to allow explicit lock type checks via
'mrislocked_access() and 'mrislocked_update()' wrappers that checked
internal counters to determine how it was locked.

In 2004, the mrlock was converted to use the generic kernel rwsem
implementation, and because that couldn't be used to track writers,
the mrlock included a mr_writer boolean field to indicate it was
write locked for the purpose of implementing the existing debug
checks. Hence the mrlock debug code has always had reliable
differentiation of read vs write state, whereas we couldn't do that
natively with rwsems for a real long time.

The mrlocks have essentially remained unchanged since 2004 - this
long predates lockdep, and it lives on because gives us something
lockdep doesn't: zero overhead locking validation.

> The new implementation makes this check reliable. Still it is not easy to
> check if a rwsem is read locked as the check will be rather complicated and
> probably racy.

You can't look at these locking checks in isolation. These checks
are done in code paths where we expect the caller to have already
locked the rwsem in the manner required. Hence there should be no races
with rwsem state changes at all.

If we see a locking assert to fire, it means we either screwed up
the XFS locking completely (no races necessary), or there's a bug in
the rwsem implementation. The latter case has occurred several
times; the rwsem locking checks in XFS have uncovered more than one
rwsem implementation bug in the past...

IOWs, the explicit lock state checks and asserts provide a
lock implemetnation validation mechanism that lockdep doesn't....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx