Re: [PATCH v3 4/6] inode: port __I_NEW to var event

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 23-08-24 14:47:38, Christian Brauner wrote:
> Port the __I_NEW mechanism to use the new var event mechanism.
> 
> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
> ---
> I'm not fully convinced that READ_ONCE() in wait_on_inode() is
> sufficient when combined with smp_mb() before wake_up_var(). Maybe we
> need smp_store_release() on inode->i_state before smp_mb() and paired
> with smp_load_acquire() in wait_on_inode().
> ---
>  fs/bcachefs/fs.c          | 10 ++++++----
>  fs/dcache.c               |  7 ++++++-
>  fs/inode.c                | 32 ++++++++++++++++++++++++--------
>  include/linux/writeback.h |  3 ++-
>  4 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
> index 94c392abef65..c0900c0c0f8a 100644
> --- a/fs/bcachefs/fs.c
> +++ b/fs/bcachefs/fs.c
> @@ -1644,14 +1644,16 @@ void bch2_evict_subvolume_inodes(struct bch_fs *c, snapshot_id_list *s)
>  				break;
>  			}
>  		} else if (clean_pass && this_pass_clean) {
> -			wait_queue_head_t *wq = bit_waitqueue(&inode->v.i_state, __I_NEW);
> -			DEFINE_WAIT_BIT(wait, &inode->v.i_state, __I_NEW);
> +			struct wait_bit_queue_entry wqe;
> +			struct wait_queue_head *wq_head;
>  
> -			prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> +			wq_head = inode_bit_waitqueue(&wqe, &inode->v, __I_NEW);
> +			prepare_to_wait_event(wq_head, &wqe.wq_entry,
> +					      TASK_UNINTERRUPTIBLE);
>  			mutex_unlock(&c->vfs_inodes_lock);
>  
>  			schedule();
> -			finish_wait(wq, &wait.wq_entry);
> +			finish_wait(wq_head, &wqe.wq_entry);

The opencoded waits are somewhat ugly. I understand this is because of
c->vfs_inodes_lock but perhaps we could introduce wait_on_inode() with some
macro magic (to do what we need before going to sleep) to make it easier?

> diff --git a/fs/inode.c b/fs/inode.c
> index 877c64a1bf63..37f20c7c2f72 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -734,7 +734,13 @@ static void evict(struct inode *inode)
>  	 * used as an indicator whether blocking on it is safe.
>  	 */
>  	spin_lock(&inode->i_lock);
> -	wake_up_bit(&inode->i_state, __I_NEW);
> +	/*
> +	 * Pairs with the barrier in prepare_to_wait_event() to make sure
> +	 * ___wait_var_event() either sees the bit cleared or
> +	 * waitqueue_active() check in wake_up_var() sees the waiter.
> +	 */
> +	smp_mb();

Why did you add smp_mb() here? We wake up on inode state change which has
happened quite some time ago and before several things guaranteeing a
barrier...

> +	inode_wake_up_bit(inode, __I_NEW);
>  	BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
>  	spin_unlock(&inode->i_lock);
>  
...
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 56b85841ae4c..8f651bb0a1a5 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -200,7 +200,8 @@ void inode_io_list_del(struct inode *inode);
>  /* writeback.h requires fs.h; it, too, is not included from here. */
>  static inline void wait_on_inode(struct inode *inode)
>  {
> -	wait_on_bit(&inode->i_state, __I_NEW, TASK_UNINTERRUPTIBLE);
> +	wait_var_event(inode_state_wait_address(inode, __I_NEW),
> +		       !(READ_ONCE(inode->i_state) & I_NEW));
>  }
>  
>  #ifdef CONFIG_CGROUP_WRITEBACK

As far as memory ordering goes, this looks good to me. But READ_ONCE() is
not really guaranteed to be enough in terms of what inode state you're
going to see when racing with i_state updates. i_state updates would have
to happen through WRITE_ONCE() for this to be safe (wrt insane compiler
deoptimizations ;)). Now that's not a new problem so I'm not sure we need
to deal with it in this patch set but it would probably deserve a comment.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux