On Tue, 2023-01-31 at 17:14 -0500, Olga Kornievskaia wrote: > On Tue, Jan 31, 2023 at 2:55 PM Andrew J. Romero <romero@xxxxxxxx> wrote: > > > > > > > > > What you are describing sounds like a bug in a system (be it client or > > > server). There is state that the client thought it closed but the > > > server still keeping that state. > > > > Hi Olga > > > > Based on my simple test script experiment, > > Here's a summary of what I believe is happening > > > > 1. An interactive user starts a process that opens a file or multiple files > > > > 2. A disruption, that prevents > > NFS-client <-> NFS-server communication, > > occurs while the file is open. This could be due to > > having the file open a long time or due to opening the file > > too close to the time of disruption. > > > > ( I believe the most common "disruption" is > > credential expiration ) > > > > 3) The user's process terminates before the disruption > > is cleared. ( or stated another way , the disruption is not cleared until after the user > > process terminates ) > > > > At the time the user process terminates, the process > > can not tell the server to close the server-side file state. > > > > After the process terminates, nothing will ever tell the server > > to close the files. The now zombie open files will continue to > > consume server-side resources. > > > > In environments with many users, the problem is significant > > > > My reasons for posting: > > > > - Are not to have your team help troubleshoot my specific issue > > ( that would be quite rude ) > > > > they are: > > > > - Determine If my NAS vendor might be accidentally > > not doing something they should be. > > ( I now don't really think this is the case. ) > > It's hard to say who's at fault here without having some more info > like tracepoints or network traces. > > > - Determine if this is a known behavior common to all NFS implementations > > ( Linux, ....etc ) and if so have your team determine if this is a problem that should be addressed > > in the spec and the implementations. > > What you describe --- having different views of state on the client > and server -- is not a known common behaviour. > > I have tried it on my Kerberos setup. > Gotten a 5min ticket. > As a user opened a file in a process that went to sleep. > My user credentials have expired (after 5mins). I verified that by > doing an "ls" on a mounted filesystem which resulted in permission > denied error. > Then I killed the application that had an opened file. This resulted > in a NFS CLOSE being sent to the server using the machine's gss > context (which is a default behaviour of the linux client regardless > of whether or not user's credentials are valid). > > Basically as far as I can tell, a linux client can handle cleaning up > state when user's credentials have expired. > > That's pretty much what I expected from looking at the code. I think this is done via the call to nfs4_state_protect. That calls: if (test_bit(sp4_mode, &clp->cl_sp4_flags)) { msg->rpc_cred = rpc_machine_cred(); ... } Could it be that cl_sp4_flags doesn't have NFS_SP4_MACH_CRED_CLEANUP set on his clients? AFAICT, that comes from the server. It also looks like cl_sp4_flags may not get set on a NFSv4.0 mount. Olga, can you test that with a v4.0 mount? -- Jeff Layton <jlayton@xxxxxxxxxx>