> On Jan 31, 2023, at 1:33 PM, Andrew J. Romero <romero@xxxxxxxx> wrote: > >> That's not the way state recovery works. Clients will reopen only >> the files that are still in use. If the clients don't open the >> "zombie" files again, then I'm fairly certain the applications >> have already closed those files. > > Hi > > In the case of my test script , I know that the files were not > closed explicitly or on script termination. ( script terminated > without credentials ) . By the time my session re-acquired credentials > ( intentionally after process termination) , the process was already terminated > and nothing, on the client, would ever attempt to clean-up the > server-side "zombie open files" > > The server-side pool usage caused by my intentionally > bad test script was not freed up until I did the cluster resource migration. > > Question: > When a simple app (for example a python script ) on the NFS client > simply opens a text file, is a lease automatically, behind the scenes, > created on the server. If so, is the server responsible for doing this: > If the lease isn't renewed every N minutes, close the file. Almost. The protocol requires: After the client reboots, when it opens its first file, the client does a SETCLIENTID or EXCHANGE_ID to establish its lease on the server. All OPEN and LOCK state is managed under the umbrella of that lease (and that includes all files that client is managing). The client keeps the lease alive by renewing the lease every minute. If the client reboots (ie, does a subsequent SETCLIENTID or EXCHANGE_ID with a new boot verifier), the server has to purge all open file state for that client. If the client fails to renew its lease, the server is free to do what it wants -- it can purge the client's lease immediately, or it can wait until conflicting opens or locks come from other clients and then purge some or all of that client's lease. If the client can't or doesn't CLOSE that file, it will remain on the server until the client tells it (implicitly by not renewing or explicitly with a fresh ID) that the state is no longer needed; or until the server reboots and the client does not re-establish the OPEN state. Therefore, rebooting individual clients that have accrued these zombie files should also clear them out without interrupting the file service for everyone else. But again, we need some way to confirm exactly how this is happening. Can you post your script, or capture client-server network traffic while the script does its thing? > By "simply opens" a text file, I mean that: the script contains no > code to request or in any way explicitly use locks -- Chuck Lever