Re: Zombie / Orphan open files

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 31 Jan 2023 13:13:34 -0500

On Tue, 2023-01-31 at 16:34 +0000, Chuck Lever III wrote:
> 
> > On Jan 31, 2023, at 9:42 AM, Andrew J. Romero <romero@xxxxxxxx> wrote:
> > 
> > In a large campus environment, usage of the relevant memory pool will eventually get so
> > high that a server-side reboot will be needed.
> 
> The above is sticking with me a bit.
> 
> Rebooting the server should force clients to re-establish state.
> 
> Are they not re-establishing open file state for users whose
> ticket has expired? I would think each client would re-establish
> state for those open files anyway, and the server would be in the
> same overcommitted state it was in before it rebooted.
> 
> We might not have an accurate root cause analysis yet, or I could
> be missing something.
> 

My assumption was that the client wasn't able to get credentials to run
the CLOSE RPC in this case, so it can't properly send the call. That's a
big assumption though. It'd be good to confirm this.

It looks like the CLOSE codepath on the client calls nfs4_state_protect
with NFS_SP4_MACH_CRED_CLEANUP, and that should make it use the machine
cred? I'm not 100% clear here though...it looks like that may be
conditional on what was sent by the server in EXCHANGE_ID.

FWIW, I don't see any reason we shouldn't use the machine cred for the
close compound. Nothing we do in there should require permission
checking.

BTW: is this NFSv4.0 or v4.1+ (or a mix)?
-- 
Jeff Layton <jlayton@xxxxxxxxxx>