On Wed, May 31, 2023 at 1:33 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 5/31/23 19:09, Ilya Dryomov wrote: > > On Thu, May 25, 2023 at 4:45 AM <xiubli@xxxxxxxxxx> wrote: > >> From: Xiubo Li <xiubli@xxxxxxxxxx> > >> > >> There is racy between capsnaps flush and removing the inode from > >> 'mdsc->snap_flush_list' list: > >> > >> Thread A Thread B > >> ceph_queue_cap_snap() > >> -> allocate 'capsnapA' > >> ->ihold('&ci->vfs_inode') > >> ->add 'capsnapA' to 'ci->i_cap_snaps' > >> ->add 'ci' to 'mdsc->snap_flush_list' > >> ... > >> ceph_flush_snaps() > >> ->__ceph_flush_snaps() > >> ->__send_flush_snap() > >> handle_cap_flushsnap_ack() > >> ->iput('&ci->vfs_inode') > >> this also will release 'ci' > >> ... > >> ceph_handle_snap() > >> ->flush_snaps() > >> ->iterate 'mdsc->snap_flush_list' > >> ->get the stale 'ci' > >> ->remove 'ci' from ->ihold(&ci->vfs_inode) this > >> 'mdsc->snap_flush_list' will WARNING > >> > >> To fix this we will remove the 'ci' from 'mdsc->snap_flush_list' > >> list just before '__send_flush_snaps()' to make sure the flushsnap > >> 'ack' will always after removing the 'ci' from 'snap_flush_list'. > > Hi Xiubo, > > > > I'm not sure I'm following the logic here. If the issue is that the > > inode can be released by handle_cap_flushsnap_ack(), meaning that ci is > > unsafe to dereference after the ack is received, what makes e.g. the > > following snippet in __ceph_flush_snaps() work: > > > > ret = __send_flush_snap(inode, session, capsnap, cap->mseq, > > oldest_flush_tid); > > if (ret < 0) { > > pr_err("__flush_snaps: error sending cap flushsnap, " > > "ino (%llx.%llx) tid %llu follows %llu\n", > > ceph_vinop(inode), cf->tid, capsnap->follows); > > } > > > > ceph_put_cap_snap(capsnap); > > spin_lock(&ci->i_ceph_lock); > > > > If the ack is handled after capsnap is put but before ci->i_ceph_lock > > is reacquired, could use-after-free occur inside spin_lock()? > > Yeah, certainly this could happen. > > After the 'ci' being freed it's possible that the 'ci' is still cached > in the 'ceph_inode_cachep' slab caches or reused by others, so > dereferring the 'ci' mostly won't crash. But just before freeing 'ci' Dereferencing an invalid pointer is a major bug regardless of whether it leads to a crash. Crashing fast is actually a pretty good case as far as potential outcomes go. This needs to be addressed: just removing ci from mdsc->snap_flush_list earlier obviously doesn't cut it. Thanks, Ilya