On Wed, 2021-08-04 at 17:26 +0100, Luis Henriques wrote: > Jeff Layton <jlayton@xxxxxxxxxx> writes: > > > There is a race in ceph_put_snap_realm. The change to the nref and the > > spinlock acquisition are not done atomically, so you could decrement nref, > > and before you take the spinlock, the nref is incremented again. At that > > point, you end up putting it on the empty list when it shouldn't be > > there. Eventually __cleanup_empty_realms runs and frees it when it's > > still in-use. > > > > Fix this by protecting the 1->0 transition with atomic_dec_and_lock, and > > just drop the spinlock if we can get the rwsem. > > > > Because these objects can also undergo a 0->1 refcount transition, we > > must protect that change as well with the spinlock. Increment locklessly > > unless the value is at 0, in which case we take the spinlock, increment > > and then take it off the empty list if it did the 0->1 transition. > > > > With these changes, I'm removing the dout() messages from these > > functions, as well as in __put_snap_realm. They've always been racy, and > > it's better to not print values that may be misleading. > > > > Cc: stable@xxxxxxxxxxxxxxx > > Cc: Sage Weil <sage@xxxxxxxxxx> > > Reported-by: Mark Nelson <mnelson@xxxxxxxxxx> > > URL: https://tracker.ceph.com/issues/46419 > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > --- > > fs/ceph/snap.c | 34 +++++++++++++++++----------------- > > 1 file changed, 17 insertions(+), 17 deletions(-) > > > > v2: No functional changes, but I cleaned up the comments a bit and > > added another in __put_snap_realm. > > > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > > index 9dbc92cfda38..158c11e96fb7 100644 > > --- a/fs/ceph/snap.c > > +++ b/fs/ceph/snap.c > > @@ -67,19 +67,19 @@ void ceph_get_snap_realm(struct ceph_mds_client *mdsc, > > { > > lockdep_assert_held(&mdsc->snap_rwsem); > > > > - dout("get_realm %p %d -> %d\n", realm, > > - atomic_read(&realm->nref), atomic_read(&realm->nref)+1); > > /* > > - * since we _only_ increment realm refs or empty the empty > > - * list with snap_rwsem held, adjusting the empty list here is > > - * safe. we do need to protect against concurrent empty list > > - * additions, however. > > + * The 0->1 and 1->0 transitions must take the snap_empty_lock > > + * atomically with the refcount change. Go ahead and bump the > > + * nref here, unless it's 0, in which case we take the spinlock > > + * and then do the increment and remove it from the list. > > */ > > - if (atomic_inc_return(&realm->nref) == 1) { > > - spin_lock(&mdsc->snap_empty_lock); > > + if (atomic_add_unless(&realm->nref, 1, 0)) > > Here you could probably use atomic_inc_not_zero() instead. But other > than that it looks good. Thanks a lot for solving yet another locking > puzzle! > > Reviewed-by: Luis Henriques <lhenriques@xxxxxxx> > > Cheers, Good point! That is a little clearer. I'll incorporate that change and merge it. Thanks, -- Jeff Layton <jlayton@xxxxxxxxxx>