On 5/31/23 20:01, Ilya Dryomov wrote:
On Wed, May 31, 2023 at 1:33 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
On 5/31/23 19:09, Ilya Dryomov wrote:
On Thu, May 25, 2023 at 4:45 AM <xiubli@xxxxxxxxxx> wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
There is racy between capsnaps flush and removing the inode from
'mdsc->snap_flush_list' list:
Thread A Thread B
ceph_queue_cap_snap()
-> allocate 'capsnapA'
->ihold('&ci->vfs_inode')
->add 'capsnapA' to 'ci->i_cap_snaps'
->add 'ci' to 'mdsc->snap_flush_list'
...
ceph_flush_snaps()
->__ceph_flush_snaps()
->__send_flush_snap()
handle_cap_flushsnap_ack()
->iput('&ci->vfs_inode')
this also will release 'ci'
...
ceph_handle_snap()
->flush_snaps()
->iterate 'mdsc->snap_flush_list'
->get the stale 'ci'
->remove 'ci' from ->ihold(&ci->vfs_inode) this
'mdsc->snap_flush_list' will WARNING
To fix this we will remove the 'ci' from 'mdsc->snap_flush_list'
list just before '__send_flush_snaps()' to make sure the flushsnap
'ack' will always after removing the 'ci' from 'snap_flush_list'.
Hi Xiubo,
I'm not sure I'm following the logic here. If the issue is that the
inode can be released by handle_cap_flushsnap_ack(), meaning that ci is
unsafe to dereference after the ack is received, what makes e.g. the
following snippet in __ceph_flush_snaps() work:
ret = __send_flush_snap(inode, session, capsnap, cap->mseq,
oldest_flush_tid);
if (ret < 0) {
pr_err("__flush_snaps: error sending cap flushsnap, "
"ino (%llx.%llx) tid %llu follows %llu\n",
ceph_vinop(inode), cf->tid, capsnap->follows);
}
ceph_put_cap_snap(capsnap);
spin_lock(&ci->i_ceph_lock);
If the ack is handled after capsnap is put but before ci->i_ceph_lock
is reacquired, could use-after-free occur inside spin_lock()?
Yeah, certainly this could happen.
After the 'ci' being freed it's possible that the 'ci' is still cached
in the 'ceph_inode_cachep' slab caches or reused by others, so
dereferring the 'ci' mostly won't crash. But just before freeing 'ci'
Dereferencing an invalid pointer is a major bug regardless of whether
it leads to a crash. Crashing fast is actually a pretty good case as
far as potential outcomes go.
This needs to be addressed: just removing ci from mdsc->snap_flush_list
earlier obviously doesn't cut it.
This could make sure that the 'ci' will be remove from the
'mdsc->snap_flush_list' list before the ack.
While there is another method to fix this, which is to increase the
'i_count' instead when adding the 'ci' to the 'mdsc->snap_flush_list'
list. This one seems safer ?
Thanks
- Xiubo
Thanks,
Ilya