On Thu, Feb 2, 2023 at 3:30 AM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 01/02/2023 21:12, Ilya Dryomov wrote: > > On Wed, Feb 1, 2023 at 2:37 AM <xiubli@xxxxxxxxxx> wrote: > >> From: Xiubo Li <xiubli@xxxxxxxxxx> > >> > >> When received corrupted snap trace we don't know what exactly has > >> happened in MDS side. And we shouldn't continue IOs and metadatas > >> access to MDS, which may corrupt or get incorrect contents. > >> > >> This patch will just block all the further IO/MDS requests > >> immediately and then evict the kclient itself. > >> > >> The reason why we still need to evict the kclient just after > >> blocking all the further IOs is that the MDS could revoke the caps > >> faster. > >> > >> Cc: stable@xxxxxxxxxxxxxxx > >> URL: https://tracker.ceph.com/issues/57686 > >> Reviewed-by: Venky Shankar <vshankar@xxxxxxxxxx> > >> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > >> --- > >> fs/ceph/addr.c | 17 +++++++++++++++-- > >> fs/ceph/caps.c | 16 +++++++++++++--- > >> fs/ceph/file.c | 9 +++++++++ > >> fs/ceph/mds_client.c | 28 +++++++++++++++++++++++++--- > >> fs/ceph/snap.c | 38 ++++++++++++++++++++++++++++++++++++-- > >> fs/ceph/super.h | 1 + > >> 6 files changed, 99 insertions(+), 10 deletions(-) > >> > >> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c > >> index 8c74871e37c9..cac4083e387a 100644 > >> --- a/fs/ceph/addr.c > >> +++ b/fs/ceph/addr.c > >> @@ -305,7 +305,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) > >> struct inode *inode = rreq->inode; > >> struct ceph_inode_info *ci = ceph_inode(inode); > >> struct ceph_fs_client *fsc = ceph_inode_to_client(inode); > >> - struct ceph_osd_request *req; > >> + struct ceph_osd_request *req = NULL; > >> struct ceph_vino vino = ceph_vino(inode); > >> struct iov_iter iter; > >> struct page **pages; > >> @@ -313,6 +313,11 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) > >> int err = 0; > >> u64 len = subreq->len; > >> > >> + if (ceph_inode_is_shutdown(inode)) { > >> + err = -EIO; > >> + goto out; > >> + } > >> + > >> if (ceph_has_inline_data(ci) && ceph_netfs_issue_op_inline(subreq)) > >> return; > >> > >> @@ -563,6 +568,9 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc) > >> > >> dout("writepage %p idx %lu\n", page, page->index); > >> > >> + if (ceph_inode_is_shutdown(inode)) > >> + return -EIO; > >> + > >> /* verify this is a writeable snap context */ > >> snapc = page_snap_context(page); > >> if (!snapc) { > >> @@ -1643,7 +1651,7 @@ int ceph_uninline_data(struct file *file) > >> struct ceph_inode_info *ci = ceph_inode(inode); > >> struct ceph_fs_client *fsc = ceph_inode_to_client(inode); > >> struct ceph_osd_request *req = NULL; > >> - struct ceph_cap_flush *prealloc_cf; > >> + struct ceph_cap_flush *prealloc_cf = NULL; > >> struct folio *folio = NULL; > >> u64 inline_version = CEPH_INLINE_NONE; > >> struct page *pages[1]; > >> @@ -1657,6 +1665,11 @@ int ceph_uninline_data(struct file *file) > >> dout("uninline_data %p %llx.%llx inline_version %llu\n", > >> inode, ceph_vinop(inode), inline_version); > >> > >> + if (ceph_inode_is_shutdown(inode)) { > >> + err = -EIO; > >> + goto out; > >> + } > >> + > >> if (inline_version == CEPH_INLINE_NONE) > >> return 0; > >> > >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > >> index f75ad432f375..210e40037881 100644 > >> --- a/fs/ceph/caps.c > >> +++ b/fs/ceph/caps.c > >> @@ -4078,6 +4078,7 @@ void ceph_handle_caps(struct ceph_mds_session *session, > >> void *p, *end; > >> struct cap_extra_info extra_info = {}; > >> bool queue_trunc; > >> + bool close_sessions = false; > >> > >> dout("handle_caps from mds%d\n", session->s_mds); > >> > >> @@ -4215,9 +4216,13 @@ void ceph_handle_caps(struct ceph_mds_session *session, > >> realm = NULL; > >> if (snaptrace_len) { > >> down_write(&mdsc->snap_rwsem); > >> - ceph_update_snap_trace(mdsc, snaptrace, > >> - snaptrace + snaptrace_len, > >> - false, &realm); > >> + if (ceph_update_snap_trace(mdsc, snaptrace, > >> + snaptrace + snaptrace_len, > >> + false, &realm)) { > >> + up_write(&mdsc->snap_rwsem); > >> + close_sessions = true; > >> + goto done; > >> + } > >> downgrade_write(&mdsc->snap_rwsem); > >> } else { > >> down_read(&mdsc->snap_rwsem); > >> @@ -4277,6 +4282,11 @@ void ceph_handle_caps(struct ceph_mds_session *session, > >> iput(inode); > >> out: > >> ceph_put_string(extra_info.pool_ns); > >> + > >> + /* Defer closing the sessions after s_mutex lock being released */ > >> + if (close_sessions) > >> + ceph_mdsc_close_sessions(mdsc); > >> + > >> return; > >> > >> flush_cap_releases: > >> diff --git a/fs/ceph/file.c b/fs/ceph/file.c > >> index 764598e1efd9..3fbf4993d15d 100644 > >> --- a/fs/ceph/file.c > >> +++ b/fs/ceph/file.c > >> @@ -943,6 +943,9 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, > >> dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len, > >> (file->f_flags & O_DIRECT) ? "O_DIRECT" : ""); > >> > >> + if (ceph_inode_is_shutdown(inode)) > >> + return -EIO; > > Hi Xiubo, > > > > ceph_sync_read() is called only from ceph_read_iter() which already > > checks for ceph_inode_is_shutdown() (although the generated error is > > ESTALE instead of EIO). Is the new one in ceph_sync_read() actually > > needed? > > > > If the answer is yes, why hasn't a corresponding check been added to > > ceph_sync_write()? > > Before I generated this patch based on the fscrypt patches, this will > be renamed to __ceph_sync_read() and also will be called by > fill_fscrypt_truncate(). I just forgot this and after rebasing I didn't > adjust it. > > I have updated the 'wip-snap-trace-blocklist' branch by removing it from > here and ceph_direct_read_write(). And I will fix this later in the > fscrypt patches. Hi Xiubo, The latest revision looks fine. I folded the "ceph: dump the msg when receiving a corrupt snap trace" follow-up into this patch and pulled the result into master. I also rebased testing appropriately, feel free to perform the necessary fscrypt-related fixups there! Thanks, Ilya