Re: [PATCH RFC v2 1/6] fs: add i_state helpers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 22, 2024 at 08:12:15AM GMT, NeilBrown wrote:
> On Thu, 22 Aug 2024, Christian Brauner wrote:
> > The i_state member is an unsigned long so that it can be used with the
> > wait bit infrastructure which expects unsigned long. This wastes 4 bytes
> > which we're unlikely to ever use. Switch to using the var event wait
> > mechanism using the address of the bit. Thanks to Linus for the address
> > idea.
> > 
> > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
> > ---
> >  fs/inode.c         | 10 ++++++++++
> >  include/linux/fs.h | 16 ++++++++++++++++
> >  2 files changed, 26 insertions(+)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 154f8689457f..f2a2f6351ec3 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -472,6 +472,16 @@ static void __inode_add_lru(struct inode *inode, bool rotate)
> >  		inode->i_state |= I_REFERENCED;
> >  }
> >  
> > +struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe,
> > +					    struct inode *inode, u32 bit)
> > +{
> > +        void *bit_address;
> > +
> > +        bit_address = inode_state_wait_address(inode, bit);
> > +        init_wait_var_entry(wqe, bit_address, 0);
> > +        return __var_waitqueue(bit_address);
> > +}
> > +
> >  /*
> >   * Add inode to LRU if needed (inode is unused and clean).
> >   *
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 23e7d46b818a..a5b036714d74 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -744,6 +744,22 @@ struct inode {
> >  	void			*i_private; /* fs or device private pointer */
> >  } __randomize_layout;
> >  
> > +/*
> > + * Get bit address from inode->i_state to use with wait_var_event()
> > + * infrastructre.
> > + */
> > +#define inode_state_wait_address(inode, bit) ((char *)&(inode)->i_state + (bit))
> > +
> > +struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe,
> > +					    struct inode *inode, u32 bit);
> > +
> > +static inline void inode_wake_up_bit(struct inode *inode, u32 bit)
> > +{
> > +	/* Ensure @bit will be seen cleared/set when caller is woken up. */
> 
> The above comment is wrong.  I think I once thought it was correct too
> but now I know better (I hope).
> A better comment might be
>        /* Insert memory barrier as recommended by wake_up_var() */
> but even that is unnecessary as we don't need the memory barrier.
> 
> A careful reading of memory-barriers.rst shows that *when the process is
> actually woken* there are sufficient barriers in wake_up_process() and
> prepare_wait_event() and the scheduler and (particularly)
> set_current_state() so that a value set before the wake_up is seen after
> the schedule().
> 
> So this barrier isn't about the bit.  This barrier is about the
> wait_queue.  In particular it is about waitqueue_active() call at the
> start of wake_up_var().  If that test wasn't there and if instead
> wake_up_var() conditionally called __wake_up(), then there would be no

Did you mean "unconditionally called"?

> need for any barrier.  A comment near wake_up_bit() makes it clear that
> the barrier is only needed when the spinlock is avoided.
> 
> On a weakly ordered arch, this test can be effected *before* the write
> of the bit.  If the waiter adds itself to the wait queue and then tests
> the bit before the bit is set, but after the waitqueue_active() test is
> put in effect, then the wake_up will never be sent.
> 
> But ....  this is all academic of this code because you don't need a
> barrier at all.  The wake_up happens in a spin_locked region, and the
> wait is entirely inside the same spin_lock, except for the schedule.  A
> later patch has:
>      spin_unlock(); schedule(); spin_lock();
> 
> So there is no room for a race.  If the bit is cleared before the
> wait_var_event() equivalent, then no wakeup is needed.  When the lock is
> dropped after the bit is cleared the unlock will have all the barriers
> needed for the bit to be visible.
> 
> The only other place that the bit can be cleared is concurrent with the
> above schedule() while the spinlock isn't held by the waiter.  In that
> case it is again clear that no barrier is needed - or that the
> spin_unlock/lock provide all the needed barriers.
> 
> So a better comment would be
> 
>    /* no barrier needs as both waker and waiter are in spin-locked regions */

Thanks for the analysis. I was under the impression that wait_on_inode()
was called in contexts where no barrier is guaranteed and the bit isn't
checked with spin_lock() held.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux