Re: [PATCH RFC v2 1/6] fs: add i_state helpers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 22 Aug 2024, Christian Brauner wrote:
> The i_state member is an unsigned long so that it can be used with the
> wait bit infrastructure which expects unsigned long. This wastes 4 bytes
> which we're unlikely to ever use. Switch to using the var event wait
> mechanism using the address of the bit. Thanks to Linus for the address
> idea.
> 
> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
> ---
>  fs/inode.c         | 10 ++++++++++
>  include/linux/fs.h | 16 ++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 154f8689457f..f2a2f6351ec3 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -472,6 +472,16 @@ static void __inode_add_lru(struct inode *inode, bool rotate)
>  		inode->i_state |= I_REFERENCED;
>  }
>  
> +struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe,
> +					    struct inode *inode, u32 bit)
> +{
> +        void *bit_address;
> +
> +        bit_address = inode_state_wait_address(inode, bit);
> +        init_wait_var_entry(wqe, bit_address, 0);
> +        return __var_waitqueue(bit_address);
> +}
> +
>  /*
>   * Add inode to LRU if needed (inode is unused and clean).
>   *
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 23e7d46b818a..a5b036714d74 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -744,6 +744,22 @@ struct inode {
>  	void			*i_private; /* fs or device private pointer */
>  } __randomize_layout;
>  
> +/*
> + * Get bit address from inode->i_state to use with wait_var_event()
> + * infrastructre.
> + */
> +#define inode_state_wait_address(inode, bit) ((char *)&(inode)->i_state + (bit))
> +
> +struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe,
> +					    struct inode *inode, u32 bit);
> +
> +static inline void inode_wake_up_bit(struct inode *inode, u32 bit)
> +{
> +	/* Ensure @bit will be seen cleared/set when caller is woken up. */

The above comment is wrong.  I think I once thought it was correct too
but now I know better (I hope).
A better comment might be
       /* Insert memory barrier as recommended by wake_up_var() */
but even that is unnecessary as we don't need the memory barrier.

A careful reading of memory-barriers.rst shows that *when the process is
actually woken* there are sufficient barriers in wake_up_process() and
prepare_wait_event() and the scheduler and (particularly)
set_current_state() so that a value set before the wake_up is seen after
the schedule().

So this barrier isn't about the bit.  This barrier is about the
wait_queue.  In particular it is about waitqueue_active() call at the
start of wake_up_var().  If that test wasn't there and if instead
wake_up_var() conditionally called __wake_up(), then there would be no
need for any barrier.  A comment near wake_up_bit() makes it clear that
the barrier is only needed when the spinlock is avoided.

On a weakly ordered arch, this test can be effected *before* the write
of the bit.  If the waiter adds itself to the wait queue and then tests
the bit before the bit is set, but after the waitqueue_active() test is
put in effect, then the wake_up will never be sent.

But ....  this is all academic of this code because you don't need a
barrier at all.  The wake_up happens in a spin_locked region, and the
wait is entirely inside the same spin_lock, except for the schedule.  A
later patch has:
     spin_unlock(); schedule(); spin_lock();

So there is no room for a race.  If the bit is cleared before the
wait_var_event() equivalent, then no wakeup is needed.  When the lock is
dropped after the bit is cleared the unlock will have all the barriers
needed for the bit to be visible.

The only other place that the bit can be cleared is concurrent with the
above schedule() while the spinlock isn't held by the waiter.  In that
case it is again clear that no barrier is needed - or that the
spin_unlock/lock provide all the needed barriers.

So a better comment would be

   /* no barrier needs as both waker and waiter are in spin-locked regions */

Thanks,
NeilBrown


> +	smp_mb();
> +	wake_up_var(inode_state_wait_address(inode, bit));
> +}
> +
>  struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode);
>  
>  static inline unsigned int i_blocksize(const struct inode *node)
> 
> -- 
> 2.43.0
> 
> 






[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux