On Thu, May 25, 2023 at 4:45 AM <xiubli@xxxxxxxxxx> wrote: > > From: Xiubo Li <xiubli@xxxxxxxxxx> > > There is racy between capsnaps flush and removing the inode from > 'mdsc->snap_flush_list' list: > > Thread A Thread B > ceph_queue_cap_snap() > -> allocate 'capsnapA' > ->ihold('&ci->vfs_inode') > ->add 'capsnapA' to 'ci->i_cap_snaps' > ->add 'ci' to 'mdsc->snap_flush_list' > ... > ceph_flush_snaps() > ->__ceph_flush_snaps() > ->__send_flush_snap() > handle_cap_flushsnap_ack() > ->iput('&ci->vfs_inode') > this also will release 'ci' > ... > ceph_handle_snap() > ->flush_snaps() > ->iterate 'mdsc->snap_flush_list' > ->get the stale 'ci' > ->remove 'ci' from ->ihold(&ci->vfs_inode) this > 'mdsc->snap_flush_list' will WARNING > > To fix this we will remove the 'ci' from 'mdsc->snap_flush_list' > list just before '__send_flush_snaps()' to make sure the flushsnap > 'ack' will always after removing the 'ci' from 'snap_flush_list'. Hi Xiubo, I'm not sure I'm following the logic here. If the issue is that the inode can be released by handle_cap_flushsnap_ack(), meaning that ci is unsafe to dereference after the ack is received, what makes e.g. the following snippet in __ceph_flush_snaps() work: ret = __send_flush_snap(inode, session, capsnap, cap->mseq, oldest_flush_tid); if (ret < 0) { pr_err("__flush_snaps: error sending cap flushsnap, " "ino (%llx.%llx) tid %llu follows %llu\n", ceph_vinop(inode), cf->tid, capsnap->follows); } ceph_put_cap_snap(capsnap); spin_lock(&ci->i_ceph_lock); If the ack is handled after capsnap is put but before ci->i_ceph_lock is reacquired, could use-after-free occur inside spin_lock()? Thanks, Ilya > > Cc: stable@xxxxxxxxxxxxxxx > URL: https://bugzilla.redhat.com/show_bug.cgi?id=2209299 > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > fs/ceph/caps.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > index feabf4cc0c4f..a8f890b3bb9a 100644 > --- a/fs/ceph/caps.c > +++ b/fs/ceph/caps.c > @@ -1595,6 +1595,11 @@ static void __ceph_flush_snaps(struct ceph_inode_info *ci, > > dout("__flush_snaps %p session %p\n", inode, session); > > + /* we will flush them all; remove this inode from the queue */ > + spin_lock(&mdsc->snap_flush_lock); > + list_del_init(&ci->i_snap_flush_item); > + spin_unlock(&mdsc->snap_flush_lock); > + > list_for_each_entry(capsnap, &ci->i_cap_snaps, ci_item) { > /* > * we need to wait for sync writes to complete and for dirty > @@ -1726,10 +1731,6 @@ void ceph_flush_snaps(struct ceph_inode_info *ci, > *psession = session; > else > ceph_put_mds_session(session); > - /* we flushed them all; remove this inode from the queue */ > - spin_lock(&mdsc->snap_flush_lock); > - list_del_init(&ci->i_snap_flush_item); > - spin_unlock(&mdsc->snap_flush_lock); > } > > /* > -- > 2.40.1 >