Re: FAILED: patch "[PATCH] ceph: take snap_empty_lock atomically with snaprealm refcount" failed to apply to 5.13-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2021-08-15 at 14:40 +0200, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
> The patch below does not apply to the 5.13-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@xxxxxxxxxxxxxxx>.
> 
> thanks,
> 
> greg k-h
> 
> ------------------ original commit in Linus's tree ------------------
> 
> From 8434ffe71c874b9c4e184b88d25de98c2bf5fe3f Mon Sep 17 00:00:00 2001
> From: Jeff Layton <jlayton@xxxxxxxxxx>
> Date: Tue, 3 Aug 2021 12:47:34 -0400
> Subject: [PATCH] ceph: take snap_empty_lock atomically with snaprealm refcount
>  change
> 
> There is a race in ceph_put_snap_realm. The change to the nref and the
> spinlock acquisition are not done atomically, so you could decrement
> nref, and before you take the spinlock, the nref is incremented again.
> At that point, you end up putting it on the empty list when it
> shouldn't be there. Eventually __cleanup_empty_realms runs and frees
> it when it's still in-use.
> 
> Fix this by protecting the 1->0 transition with atomic_dec_and_lock,
> and just drop the spinlock if we can get the rwsem.
> 
> Because these objects can also undergo a 0->1 refcount transition, we
> must protect that change as well with the spinlock. Increment locklessly
> unless the value is at 0, in which case we take the spinlock, increment
> and then take it off the empty list if it did the 0->1 transition.
> 
> With these changes, I'm removing the dout() messages from these
> functions, as well as in __put_snap_realm. They've always been racy, and
> it's better to not print values that may be misleading.
> 
> Cc: stable@xxxxxxxxxxxxxxx
> URL: https://tracker.ceph.com/issues/46419
> Reported-by: Mark Nelson <mnelson@xxxxxxxxxx>
> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> Reviewed-by: Luis Henriques <lhenriques@xxxxxxx>
> Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx>
> 
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index 4ac0606dcbd4..4c6bd1042c94 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -67,19 +67,19 @@ void ceph_get_snap_realm(struct ceph_mds_client *mdsc,
>  {
>  	lockdep_assert_held(&mdsc->snap_rwsem);
>  
> -	dout("get_realm %p %d -> %d\n", realm,
> -	     atomic_read(&realm->nref), atomic_read(&realm->nref)+1);
>  	/*
> -	 * since we _only_ increment realm refs or empty the empty
> -	 * list with snap_rwsem held, adjusting the empty list here is
> -	 * safe.  we do need to protect against concurrent empty list
> -	 * additions, however.
> +	 * The 0->1 and 1->0 transitions must take the snap_empty_lock
> +	 * atomically with the refcount change. Go ahead and bump the
> +	 * nref here, unless it's 0, in which case we take the spinlock
> +	 * and then do the increment and remove it from the list.
>  	 */
> -	if (atomic_inc_return(&realm->nref) == 1) {
> -		spin_lock(&mdsc->snap_empty_lock);
> +	if (atomic_inc_not_zero(&realm->nref))
> +		return;
> +
> +	spin_lock(&mdsc->snap_empty_lock);
> +	if (atomic_inc_return(&realm->nref) == 1)
>  		list_del_init(&realm->empty_item);
> -		spin_unlock(&mdsc->snap_empty_lock);
> -	}
> +	spin_unlock(&mdsc->snap_empty_lock);
>  }
>  
>  static void __insert_snap_realm(struct rb_root *root,
> @@ -208,28 +208,28 @@ static void __put_snap_realm(struct ceph_mds_client *mdsc,
>  {
>  	lockdep_assert_held_write(&mdsc->snap_rwsem);
>  
> -	dout("__put_snap_realm %llx %p %d -> %d\n", realm->ino, realm,
> -	     atomic_read(&realm->nref), atomic_read(&realm->nref)-1);
> +	/*
> +	 * We do not require the snap_empty_lock here, as any caller that
> +	 * increments the value must hold the snap_rwsem.
> +	 */
>  	if (atomic_dec_and_test(&realm->nref))
>  		__destroy_snap_realm(mdsc, realm);
>  }
>  
>  /*
> - * caller needn't hold any locks
> + * See comments in ceph_get_snap_realm. Caller needn't hold any locks.
>   */
>  void ceph_put_snap_realm(struct ceph_mds_client *mdsc,
>  			 struct ceph_snap_realm *realm)
>  {
> -	dout("put_snap_realm %llx %p %d -> %d\n", realm->ino, realm,
> -	     atomic_read(&realm->nref), atomic_read(&realm->nref)-1);
> -	if (!atomic_dec_and_test(&realm->nref))
> +	if (!atomic_dec_and_lock(&realm->nref, &mdsc->snap_empty_lock))
>  		return;
>  
>  	if (down_write_trylock(&mdsc->snap_rwsem)) {
> +		spin_unlock(&mdsc->snap_empty_lock);
>  		__destroy_snap_realm(mdsc, realm);
>  		up_write(&mdsc->snap_rwsem);
>  	} else {
> -		spin_lock(&mdsc->snap_empty_lock);
>  		list_add(&realm->empty_item, &mdsc->snap_empty);
>  		spin_unlock(&mdsc->snap_empty_lock);
>  	}
> 

Ahh, I forgot to account for some new lockdep annotation when I marked
these for stable. I think what we should probably do here is cherry-pick
these as prerequisites before applying:

    a6862e6708c1 ceph: add some lockdep assertions around snaprealm handling
    df2c0cb7f8e8 ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm

The first one should fix up the merge conflict, and the second will fix
up a couple of bogus lockdep warnings that pop up from a6862e6708c1.

Greg, does that sound OK? 
-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux