Re: [PATCH v7 2/2] ceph: blocklist the kclient when receiving corrupted snap trace

Ilya Dryomov <idryomov@xxxxxxxxx> · Thu, 2 Feb 2023 14:11:26 +0100

On Thu, Feb 2, 2023 at 3:30 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
>
>
> On 01/02/2023 21:12, Ilya Dryomov wrote:
> > On Wed, Feb 1, 2023 at 2:37 AM <xiubli@xxxxxxxxxx> wrote:
> >> From: Xiubo Li <xiubli@xxxxxxxxxx>
> >>
> >> When received corrupted snap trace we don't know what exactly has
> >> happened in MDS side. And we shouldn't continue IOs and metadatas
> >> access to MDS, which may corrupt or get incorrect contents.
> >>
> >> This patch will just block all the further IO/MDS requests
> >> immediately and then evict the kclient itself.
> >>
> >> The reason why we still need to evict the kclient just after
> >> blocking all the further IOs is that the MDS could revoke the caps
> >> faster.
> >>
> >> Cc: stable@xxxxxxxxxxxxxxx
> >> URL: https://tracker.ceph.com/issues/57686
> >> Reviewed-by: Venky Shankar <vshankar@xxxxxxxxxx>
> >> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> >> ---
> >>   fs/ceph/addr.c       | 17 +++++++++++++++--
> >>   fs/ceph/caps.c       | 16 +++++++++++++---
> >>   fs/ceph/file.c       |  9 +++++++++
> >>   fs/ceph/mds_client.c | 28 +++++++++++++++++++++++++---
> >>   fs/ceph/snap.c       | 38 ++++++++++++++++++++++++++++++++++++--
> >>   fs/ceph/super.h      |  1 +
> >>   6 files changed, 99 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> >> index 8c74871e37c9..cac4083e387a 100644
> >> --- a/fs/ceph/addr.c
> >> +++ b/fs/ceph/addr.c
> >> @@ -305,7 +305,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq)
> >>          struct inode *inode = rreq->inode;
> >>          struct ceph_inode_info *ci = ceph_inode(inode);
> >>          struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> >> -       struct ceph_osd_request *req;
> >> +       struct ceph_osd_request *req = NULL;
> >>          struct ceph_vino vino = ceph_vino(inode);
> >>          struct iov_iter iter;
> >>          struct page **pages;
> >> @@ -313,6 +313,11 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq)
> >>          int err = 0;
> >>          u64 len = subreq->len;
> >>
> >> +       if (ceph_inode_is_shutdown(inode)) {
> >> +               err = -EIO;
> >> +               goto out;
> >> +       }
> >> +
> >>          if (ceph_has_inline_data(ci) && ceph_netfs_issue_op_inline(subreq))
> >>                  return;
> >>
> >> @@ -563,6 +568,9 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
> >>
> >>          dout("writepage %p idx %lu\n", page, page->index);
> >>
> >> +       if (ceph_inode_is_shutdown(inode))
> >> +               return -EIO;
> >> +
> >>          /* verify this is a writeable snap context */
> >>          snapc = page_snap_context(page);
> >>          if (!snapc) {
> >> @@ -1643,7 +1651,7 @@ int ceph_uninline_data(struct file *file)
> >>          struct ceph_inode_info *ci = ceph_inode(inode);
> >>          struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> >>          struct ceph_osd_request *req = NULL;
> >> -       struct ceph_cap_flush *prealloc_cf;
> >> +       struct ceph_cap_flush *prealloc_cf = NULL;
> >>          struct folio *folio = NULL;
> >>          u64 inline_version = CEPH_INLINE_NONE;
> >>          struct page *pages[1];
> >> @@ -1657,6 +1665,11 @@ int ceph_uninline_data(struct file *file)
> >>          dout("uninline_data %p %llx.%llx inline_version %llu\n",
> >>               inode, ceph_vinop(inode), inline_version);
> >>
> >> +       if (ceph_inode_is_shutdown(inode)) {
> >> +               err = -EIO;
> >> +               goto out;
> >> +       }
> >> +
> >>          if (inline_version == CEPH_INLINE_NONE)
> >>                  return 0;
> >>
> >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> >> index f75ad432f375..210e40037881 100644
> >> --- a/fs/ceph/caps.c
> >> +++ b/fs/ceph/caps.c
> >> @@ -4078,6 +4078,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
> >>          void *p, *end;
> >>          struct cap_extra_info extra_info = {};
> >>          bool queue_trunc;
> >> +       bool close_sessions = false;
> >>
> >>          dout("handle_caps from mds%d\n", session->s_mds);
> >>
> >> @@ -4215,9 +4216,13 @@ void ceph_handle_caps(struct ceph_mds_session *session,
> >>                  realm = NULL;
> >>                  if (snaptrace_len) {
> >>                          down_write(&mdsc->snap_rwsem);
> >> -                       ceph_update_snap_trace(mdsc, snaptrace,
> >> -                                              snaptrace + snaptrace_len,
> >> -                                              false, &realm);
> >> +                       if (ceph_update_snap_trace(mdsc, snaptrace,
> >> +                                                  snaptrace + snaptrace_len,
> >> +                                                  false, &realm)) {
> >> +                               up_write(&mdsc->snap_rwsem);
> >> +                               close_sessions = true;
> >> +                               goto done;
> >> +                       }
> >>                          downgrade_write(&mdsc->snap_rwsem);
> >>                  } else {
> >>                          down_read(&mdsc->snap_rwsem);
> >> @@ -4277,6 +4282,11 @@ void ceph_handle_caps(struct ceph_mds_session *session,
> >>          iput(inode);
> >>   out:
> >>          ceph_put_string(extra_info.pool_ns);
> >> +
> >> +       /* Defer closing the sessions after s_mutex lock being released */
> >> +       if (close_sessions)
> >> +               ceph_mdsc_close_sessions(mdsc);
> >> +
> >>          return;
> >>
> >>   flush_cap_releases:
> >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >> index 764598e1efd9..3fbf4993d15d 100644
> >> --- a/fs/ceph/file.c
> >> +++ b/fs/ceph/file.c
> >> @@ -943,6 +943,9 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
> >>          dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len,
> >>               (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
> >>
> >> +       if (ceph_inode_is_shutdown(inode))
> >> +               return -EIO;
> > Hi Xiubo,
> >
> > ceph_sync_read() is called only from ceph_read_iter() which already
> > checks for ceph_inode_is_shutdown() (although the generated error is
> > ESTALE instead of EIO).  Is the new one in ceph_sync_read() actually
> > needed?
> >
> > If the answer is yes, why hasn't a corresponding check been added to
> > ceph_sync_write()?
>
> Before I generated this patch based on the fscrypt patches,  this will
> be renamed to __ceph_sync_read() and also will be called by
> fill_fscrypt_truncate(). I just forgot this and after rebasing I didn't
> adjust it.
>
> I have updated the 'wip-snap-trace-blocklist' branch by removing it from
> here and ceph_direct_read_write(). And I will fix this later in the
> fscrypt patches.

Hi Xiubo,

The latest revision looks fine.  I folded the "ceph: dump the msg when
receiving a corrupt snap trace" follow-up into this patch and pulled the
result into master.

I also rebased testing appropriately, feel free to perform the necessary
fscrypt-related fixups there!

Thanks,

                Ilya