On Mon, Feb 7, 2022 at 8:13 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > The tracker bug mentions that this occurs after an MDS is restarted. > Could this be the result of clients relying on delete-on-last-close > behavior? Oooh, I didn't actually look at the tracker. > > IOW, we have a situation where a file is opened and then unlinked, and > userland is actively doing I/O to it. The thing gets moved into the > strays dir, but isn't unlinked yet because we have open files against > it. Everything works fine at this point... > > Then, the MDS restarts and the inode gets purged altogether. Client > reconnects and tries to reclaim his open, and gets ESTALE. Uh, okay. So I didn't do a proper audit before I sent my previous reply, but one of the cases I did see was that the MDS returns ESTALE if you try to do a name lookup on an inode in the stray directory. I don't know if that's what is happening here or not? But perhaps that's the root of the problem in this case. Oh, nope, I see it's issuing getattr requests. That doesn't do ESTALE directly so it must indeed be coming out of MDCache::path_traverse. The MDS shouldn't move an inode into the purge queue on restart unless there were no clients with caps on it (that state is persisted to disk so it knows). Maybe if the clients don't make the reconnect window it's dropping them all and *then* moves it into purge queue? I think we need to identify what's happening there before we issue kernel client changes, Xiubo? -Greg