On Tue, Jun 6, 2023 at 2:55 AM <xiubli@xxxxxxxxxx> wrote: > > From: Xiubo Li <xiubli@xxxxxxxxxx> > > There is a race between capsnaps flush and removing the inode from > 'mdsc->snap_flush_list' list: > > == Thread A == == Thread B == > ceph_queue_cap_snap() > -> allocate 'capsnapA' > ->ihold('&ci->vfs_inode') > ->add 'capsnapA' to 'ci->i_cap_snaps' > ->add 'ci' to 'mdsc->snap_flush_list' > ... > == Thread C == > ceph_flush_snaps() > ->__ceph_flush_snaps() > ->__send_flush_snap() > handle_cap_flushsnap_ack() > ->iput('&ci->vfs_inode') > this also will release 'ci' > ... > == Thread D == > ceph_handle_snap() > ->flush_snaps() > ->iterate 'mdsc->snap_flush_list' > ->get the stale 'ci' > ->remove 'ci' from ->ihold(&ci->vfs_inode) this > 'mdsc->snap_flush_list' will WARNING > > To fix this we will increase the inode's i_count ref when adding 'ci' > to the 'mdsc->snap_flush_list' list. > > Cc: stable@xxxxxxxxxxxxxxx > URL: https://bugzilla.redhat.com/show_bug.cgi?id=2209299 > Reviewed-by: Milind Changire <mchangir@xxxxxxxxxx> > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > > V3: > - Fix two minor typo in commit comments. > > > > fs/ceph/caps.c | 6 ++++++ > fs/ceph/snap.c | 4 +++- > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > index feabf4cc0c4f..7c2cb813aba4 100644 > --- a/fs/ceph/caps.c > +++ b/fs/ceph/caps.c > @@ -1684,6 +1684,7 @@ void ceph_flush_snaps(struct ceph_inode_info *ci, > struct inode *inode = &ci->netfs.inode; > struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; > struct ceph_mds_session *session = NULL; > + int put = 0; Hi Xiubo, Nit: renaming this variable to need_put and making it a bool would communicate the intent better. > int mds; > > dout("ceph_flush_snaps %p\n", inode); > @@ -1728,8 +1729,13 @@ void ceph_flush_snaps(struct ceph_inode_info *ci, > ceph_put_mds_session(session); > /* we flushed them all; remove this inode from the queue */ > spin_lock(&mdsc->snap_flush_lock); > + if (!list_empty(&ci->i_snap_flush_item)) > + put++; What are the cases when ci is expected to not be on snap_flush_list list (and therefore there is no corresponding reference to put)? The reason I'm asking is that ceph_flush_snaps() is called from two other places directly (i.e. without iterating snap_flush_list list) and then __ceph_flush_snaps() is called from two yet other places. The problem that we are presented with here is that __ceph_flush_snaps() effectively consumes a reference on ci. Is ci protected from being freed by handle_cap_flushsnap_ack() very soon after __send_flush_snap() returns in all these other places? Thanks, Ilya