On Mon, Feb 7, 2022 at 11:56 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 2/7/22 11:12 PM, Jeff Layton wrote: > > On Mon, 2022-02-07 at 13:03 +0800, xiubli@xxxxxxxxxx wrote: > >> From: Xiubo Li <xiubli@xxxxxxxxxx> > >> > >> If MDS return ESTALE, that means the MDS has already iterated all > >> the possible active MDSes including the auth MDS or the inode is > >> under purging. No need to retry in auth MDS and will just return > >> ESTALE directly. > >> > > When you say "purging" here, do you mean that it's effectively being > > cleaned up after being unlinked? Or is it just being purged from the > > MDS's cache? > > There is one case when the client just removes the file or the file is > force overrode via renaming, the related dentry in MDS will be put to > stray dir and the Inode will be marked as under purging. I'm confused by this statement. Renaming inode B over inode A should still just be a normal unlink for A, and not trigger an immediate purge. Is there something in the MDS that force-starts that and breaks clients who still have the inode held open? -Greg > So in case when > the client try to call getattr, for example, after that, or retries the > pending getattr request, which can cause the MDS returns ESTALE errno in > theory. > > Locally I still haven't reproduced it yet. > > >> Or it will cause definite loop for retrying it. > >> > >> URL: https://tracker.ceph.com/issues/53504 > >> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > >> --- > >> fs/ceph/mds_client.c | 29 ----------------------------- > >> 1 file changed, 29 deletions(-) > >> > >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > >> index 93e5e3c4ba64..c918d2ac8272 100644 > >> --- a/fs/ceph/mds_client.c > >> +++ b/fs/ceph/mds_client.c > >> @@ -3368,35 +3368,6 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg) > >> > >> result = le32_to_cpu(head->result); > >> > >> - /* > >> - * Handle an ESTALE > >> - * if we're not talking to the authority, send to them > >> - * if the authority has changed while we weren't looking, > >> - * send to new authority > >> - * Otherwise we just have to return an ESTALE > >> - */ > >> - if (result == -ESTALE) { > >> - dout("got ESTALE on request %llu\n", req->r_tid); > >> - req->r_resend_mds = -1; > >> - if (req->r_direct_mode != USE_AUTH_MDS) { > >> - dout("not using auth, setting for that now\n"); > >> - req->r_direct_mode = USE_AUTH_MDS; > >> - __do_request(mdsc, req); > >> - mutex_unlock(&mdsc->mutex); > >> - goto out; > >> - } else { > >> - int mds = __choose_mds(mdsc, req, NULL); > >> - if (mds >= 0 && mds != req->r_session->s_mds) { > >> - dout("but auth changed, so resending\n"); > >> - __do_request(mdsc, req); > >> - mutex_unlock(&mdsc->mutex); > >> - goto out; > >> - } > >> - } > >> - dout("have to return ESTALE on request %llu\n", req->r_tid); > >> - } > >> - > >> - > >> if (head->safe) { > >> set_bit(CEPH_MDS_R_GOT_SAFE, &req->r_req_flags); > >> __unregister_request(mdsc, req); > > > > (cc'ing Greg, Sage and Zheng) > > > > This patch sort of contradicts the original design, AFAICT, and I'm not > > sure what the correct behavior should be. I could use some > > clarification. > > > > The original code (from the 2009 merge) would tolerate 2 ESTALEs before > > giving up and returning that to userland. Then in 2010, Greg added this > > commit: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e55b71f802fd448a79275ba7b263fe1a8639be5f > > > > ...which would presumably make it retry indefinitely as long as the auth > > MDS kept changing. Then, Zheng made this change in 2013: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca18bede048e95a749d13410ce1da4ad0ffa7938 > > > > ...which seems to try to do the same thing, but detected the auth mds > > change in a different way. > > > > Is that where livelock detection was broken? Or was there some > > corresponding change to __choose_mds that should prevent infinitely > > looping on the same request? > > > > In NFS, ESTALE errors mean that the filehandle (inode) no longer exists > > and that the server has forgotten about it. Does it mean the same thing > > to the ceph MDS? > > > > Has the behavior of the MDS changed such that these retries are no > > longer necessary on an ESTALE? If so, when did this change, and does the > > client need to do anything to detect what behavior it should be using? >