On Tue, 2020-09-29 at 12:58 +0200, Ilya Dryomov wrote: > On Tue, Sep 29, 2020 at 12:44 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote: > > On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote: > > > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > Ilya noticed that he would get spurious EACCES errors on calls done just > > > > > after blocklisting the client on mounts with recover_session=clean. The > > > > > session would get marked as REJECTED and that caused in-flight calls to > > > > > die with EACCES. This patchset seems to smooth over the problem, but I'm > > > > > not fully convinced it's the right approach. > > > > > > > > > > > > > the root is cause is that client does not recover session instantly > > > > after getting rejected by mds. Before session gets recovered, client > > > > continues to return error. > > > > > > Hi Zheng, > > > > > > I don't think it's about whether that happens instantly or not. > > > In the example from [1], the first "ls" would fail even if issued > > > minutes after the session reject message and the reconnect. From > > > the user's POV it is well after the automatic recovery promised by > > > recover_session=clean. > > > > > > [1] https://tracker.ceph.com/issues/47385 > > > > Reconnect should close all old session. It's likely because that > > client didn't detect it's blacklisted. > > Sorry, I should have pasted dmesg there as well. It _does_ detect > blacklisting -- notice that I wrote "after the session reject message > and the reconnect". > Yep, this is pretty easy to reproduce too (as Ilya points out in the tracker). I'm open to other ways of smoothing this over. If we end up with a small window where errors can occur, then so be it, but I think we can probably do better than we have now. -- Jeff Layton <jlayton@xxxxxxxxxx>