Re: [PATCH 1/3] vfs: remove lockdep bogosity in __sb_start_write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 09-11-20 10:16:50, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> __sb_start_write has some weird looking lockdep code that claims to
> exist to handle nested freeze locking requests from xfs.  The code as
> written seems broken -- if we think we hold a read lock on any of the
> higher freeze levels (e.g. we hold SB_FREEZE_WRITE and are trying to
> lock SB_FREEZE_PAGEFAULT), it converts a blocking lock attempt into a
> trylock.
> 
> However, it's not correct to downgrade a blocking lock attempt to a
> trylock unless the downgrading code or the callers are prepared to deal
> with that situation.  Neither __sb_start_write nor its callers handle
> this at all.  For example:
> 
> sb_start_pagefault ignores the return value completely, with the result
> that if xfs_filemap_fault loses a race with a different thread trying to
> fsfreeze, it will proceed without pagefault freeze protection (thereby
> breaking locking rules) and then unlocks the pagefault freeze lock that
> it doesn't own on its way out (thereby corrupting the lock state), which
> leads to a system hang shortly afterwards.
> 
> Normally, this won't happen because our ownership of a read lock on a
> higher freeze protection level blocks fsfreeze from grabbing a write
> lock on that higher level.  *However*, if lockdep is offline,
> lock_is_held_type unconditionally returns 1, which means that
> percpu_rwsem_is_held returns 1, which means that __sb_start_write
> unconditionally converts blocking freeze lock attempts into trylocks,
> even when we *don't* hold anything that would block a fsfreeze.
> 
> Apparently this all held together until 5.10-rc1, when bugs in lockdep
> caused lockdep to shut itself off early in an fstests run, and once
> fstests gets to the "race writes with freezer" tests, kaboom.  This
> might explain the long trail of vanishingly infrequent livelocks in
> fstests after lockdep goes offline that I've never been able to
> diagnose.
> 
> We could fix it by spinning on the trylock if wait==true, but AFAICT the
> locking works fine if lockdep is not built at all (and I didn't see any
> complaints running fstests overnight), so remove this snippet entirely.
> 
> NOTE: Commit f4b554af9931 in 2015 created the current weird logic (which
> used to exist in a different form in commit 5accdf82ba25c from 2012) in
> __sb_start_write.  XFS solved this whole problem in the late 2.6 era by
> creating a variant of transactions (XFS_TRANS_NO_WRITECOUNT) that don't
> grab intwrite freeze protection, thus making lockdep's solution
> unnecessary.  The commit claims that Dave Chinner explained that the
> trylock hack + comment could be removed, but nobody ever did.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> Reviewed-by: Christoph Hellwig <hch@xxxxxx>

Thanks for cleaning this up. You can add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

								Honza

> ---
>  fs/super.c |   33 ++++-----------------------------
>  1 file changed, 4 insertions(+), 29 deletions(-)
> 
> 
> diff --git a/fs/super.c b/fs/super.c
> index a51c2083cd6b..e1fd667454d4 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1647,36 +1647,11 @@ EXPORT_SYMBOL(__sb_end_write);
>   */
>  int __sb_start_write(struct super_block *sb, int level, bool wait)
>  {
> -	bool force_trylock = false;
> -	int ret = 1;
> +	if (!wait)
> +		return percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
>  
> -#ifdef CONFIG_LOCKDEP
> -	/*
> -	 * We want lockdep to tell us about possible deadlocks with freezing
> -	 * but it's it bit tricky to properly instrument it. Getting a freeze
> -	 * protection works as getting a read lock but there are subtle
> -	 * problems. XFS for example gets freeze protection on internal level
> -	 * twice in some cases, which is OK only because we already hold a
> -	 * freeze protection also on higher level. Due to these cases we have
> -	 * to use wait == F (trylock mode) which must not fail.
> -	 */
> -	if (wait) {
> -		int i;
> -
> -		for (i = 0; i < level - 1; i++)
> -			if (percpu_rwsem_is_held(sb->s_writers.rw_sem + i)) {
> -				force_trylock = true;
> -				break;
> -			}
> -	}
> -#endif
> -	if (wait && !force_trylock)
> -		percpu_down_read(sb->s_writers.rw_sem + level-1);
> -	else
> -		ret = percpu_down_read_trylock(sb->s_writers.rw_sem + level-1);
> -
> -	WARN_ON(force_trylock && !ret);
> -	return ret;
> +	percpu_down_read(sb->s_writers.rw_sem + level-1);
> +	return 1;
>  }
>  EXPORT_SYMBOL(__sb_start_write);
>  
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux