Re: [PATCH v3] ceph: fix use-after-free bug for inodes when flushing capsnaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/7/23 16:03, Ilya Dryomov wrote:
On Tue, Jun 6, 2023 at 3:30 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:

On 6/6/23 18:21, Ilya Dryomov wrote:
On Tue, Jun 6, 2023 at 2:55 AM <xiubli@xxxxxxxxxx> wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

There is a race between capsnaps flush and removing the inode from
'mdsc->snap_flush_list' list:

     == Thread A ==                     == Thread B ==
ceph_queue_cap_snap()
   -> allocate 'capsnapA'
   ->ihold('&ci->vfs_inode')
   ->add 'capsnapA' to 'ci->i_cap_snaps'
   ->add 'ci' to 'mdsc->snap_flush_list'
      ...
     == Thread C ==
ceph_flush_snaps()
   ->__ceph_flush_snaps()
    ->__send_flush_snap()
                                  handle_cap_flushsnap_ack()
                                   ->iput('&ci->vfs_inode')
                                     this also will release 'ci'
                                      ...
                                        == Thread D ==
                                  ceph_handle_snap()
                                   ->flush_snaps()
                                    ->iterate 'mdsc->snap_flush_list'
                                     ->get the stale 'ci'
   ->remove 'ci' from                ->ihold(&ci->vfs_inode) this
     'mdsc->snap_flush_list'           will WARNING

To fix this we will increase the inode's i_count ref when adding 'ci'
to the 'mdsc->snap_flush_list' list.

Cc: stable@xxxxxxxxxxxxxxx
URL: https://bugzilla.redhat.com/show_bug.cgi?id=2209299
Reviewed-by: Milind Changire <mchangir@xxxxxxxxxx>
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---

V3:
- Fix two minor typo in commit comments.



   fs/ceph/caps.c | 6 ++++++
   fs/ceph/snap.c | 4 +++-
   2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index feabf4cc0c4f..7c2cb813aba4 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1684,6 +1684,7 @@ void ceph_flush_snaps(struct ceph_inode_info *ci,
          struct inode *inode = &ci->netfs.inode;
          struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
          struct ceph_mds_session *session = NULL;
+       int put = 0;
Hi Xiubo,

Nit: renaming this variable to need_put and making it a bool would
communicate the intent better.
Hi Ilya

Sure, will update it.

          int mds;

          dout("ceph_flush_snaps %p\n", inode);
@@ -1728,8 +1729,13 @@ void ceph_flush_snaps(struct ceph_inode_info *ci,
                  ceph_put_mds_session(session);
          /* we flushed them all; remove this inode from the queue */
          spin_lock(&mdsc->snap_flush_lock);
+       if (!list_empty(&ci->i_snap_flush_item))
+               put++;
What are the cases when ci is expected to not be on snap_flush_list
list (and therefore there is no corresponding reference to put)?

The reason I'm asking is that ceph_flush_snaps() is called from two
other places directly (i.e. without iterating snap_flush_list list) and
then __ceph_flush_snaps() is called from two yet other places.  The
problem that we are presented with here is that __ceph_flush_snaps()
effectively consumes a reference on ci.  Is ci protected from being
freed by handle_cap_flushsnap_ack() very soon after __send_flush_snap()
returns in all these other places?
There are 4 places will call the 'ceph_flush_snaps()':

Cscope tag: ceph_flush_snaps
     #   line  filename / context / line
     1   3221  fs/ceph/caps.c <<__ceph_put_cap_refs>>
               ceph_flush_snaps(ci, NULL);
     2   3336  fs/ceph/caps.c <<ceph_put_wrbuffer_cap_refs>>
               ceph_flush_snaps(ci, NULL);
     3   2243  fs/ceph/inode.c <<ceph_inode_work>>
               ceph_flush_snaps(ci, NULL);
     4    941  fs/ceph/snap.c <<flush_snaps>>
               ceph_flush_snaps(ci, &session);
Type number and <Enter> (q or empty cancels):

For #1 it will add the 'ci' to the 'mdsc->snap_flush_list' list by
calling '__ceph_finish_cap_snap()' and then call the
'ceph_flush_snaps()' directly or defer call it in the queue work in #3.

The #3 is the reason why we need the 'mdsc->snap_flush_list' list.

For #2 it won't add the 'ci' to the list because it will always call the
'ceph_flush_snaps()' directly.

For #4 it will call 'ceph_flush_snaps()' by iterating the
'mdsc->snap_flush_list' list just before the #3 being triggered.

The problem only exists in case of #1 --> #4, which will make the stale
'ci' to be held in the 'mdsc->snap_flush_list' list after 'capsnap' and
'ci' being freed. All the other cases are okay because the 'ci' will be
protected by increasing the ref when allocating the 'capsnap' and will
decrease the ref in 'handle_cap_flushsnap_ack()' when freeing the 'capsnap'.

Note: the '__ceph_flush_snaps()' won't increase the ref. The
'handle_cap_flushsnap_ack()' will just try to decrease the ref and only
in case the ref reaches to '0' will the 'ci' be freed.
So my question is: are all __ceph_flush_snaps() callers guaranteed to
hold an extra (i.e. one that is not tied to capsnap) reference on ci so
that when handle_cap_flushsnap_ack() drops one that is tied to capsnap
the reference count doesn't reach 0?  It sounds like you are confident
that there is no issue with ceph_flush_snaps() callers, but it would be
nice if you could confirm the same for bare __ceph_flush_snaps() call
sites in caps.c.

Yeah, checked the code again carefully.  I am sure that when calling the '__ceph_flush_snaps()' it has already well handled the reference too.

Once the 'capsnap' is in 'ci->i_cap_snaps' list the inode's reference must have already been increased, please see 'ceph_queue_cap_snap()', which will allocate and insert 'capsnap' to this list.

Except the 'mdsc->snap_flush_list' list, the 'capsnap' will be added to three other lists, which are 'ci->i_cap_snaps', 'mdsc->cap_flush_list' and 'ci->i_cap_flush_list'.


 744 INIT_LIST_HEAD(&capsnap->cap_flush.i_list);      --> ci->i_cap_flush_list  745 INIT_LIST_HEAD(&capsnap->cap_flush.g_list);    --> mdsc->cap_flush_list  746 INIT_LIST_HEAD(&capsnap->ci_item);                   --> ci->i_cap_snaps


But 'capsnap' will be removed from them all just before freeing the 'capsnap' and iput(inode), more detail please see:


  handle_cap_flushsnap_ack()
      -->  ceph_remove_capsnap()
            --> __ceph_remove_capsnap()
                 --> list_del_init(&capsnap->ci_item); // remove from 'ci->i_cap_snaps' list                  --> __detach_cap_flush_from_ci()         // remove from 'ci->i_cap_flush_list' list                 --> __detach_cap_flush_from_mdsc()} // remove from 'mdsc->cap_flush_list'
      --> ceph_put_cap_snap(capsnap)
      --> iput(inode)


The 'mdsc->cap_flush_list' list will be proected by 'mdsc->cap_dirty_lock', so do not have any issue here.

So I am sure that just before 'capsnap' being freed the 'ci' won't released in the above case.

Thanks

- Xiubo

Thanks,

                 Ilya





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux