Re: Zombie / Orphan open files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Jan 31, 2023, at 1:33 PM, Andrew J. Romero <romero@xxxxxxxx> wrote:
> 
>> That's not the way state recovery works. Clients will reopen only
>> the files that are still in use. If the clients don't open the
>> "zombie" files again, then I'm fairly certain the applications
>> have already closed those files.
> 
> Hi
> 
> In the case of my test script , I know that the files were not
> closed explicitly or on script termination. ( script terminated
> without credentials ) .   By the time my session re-acquired credentials
> ( intentionally after process termination) , the process was already terminated
> and nothing, on the client, would ever attempt to clean-up the
> server-side "zombie open files"
> 
> The server-side pool usage caused by my intentionally
> bad test script was not freed up until I did the cluster resource migration.
> 
> Question:
> When a simple app (for example a python script ) on the NFS client 
> simply opens a text file,  is a lease automatically, behind the scenes, 
> created on the server. If so, is the server responsible for doing this:
> If the lease isn't renewed every N minutes, close the file.

Almost. The protocol requires:

After the client reboots, when it opens its first file, the client
does a SETCLIENTID or EXCHANGE_ID to establish its lease on the
server. All OPEN and LOCK state is managed under the umbrella of
that lease (and that includes all files that client is managing).
The client keeps the lease alive by renewing the lease every minute.

If the client reboots (ie, does a subsequent SETCLIENTID or
EXCHANGE_ID with a new boot verifier), the server has to purge all
open file state for that client.

If the client fails to renew its lease, the server is free to do
what it wants -- it can purge the client's lease immediately, or
it can wait until conflicting opens or locks come from other
clients and then purge some or all of that client's lease.

If the client can't or doesn't CLOSE that file, it will remain
on the server until the client tells it (implicitly by not
renewing or explicitly with a fresh ID) that the state is no
longer needed; or until the server reboots and the client does
not re-establish the OPEN state.

Therefore, rebooting individual clients that have accrued these
zombie files should also clear them out without interrupting the
file service for everyone else.

But again, we need some way to confirm exactly how this is
happening. Can you post your script, or capture client-server
network traffic while the script does its thing?


> By "simply opens" a text file, I mean that:   the script contains no
> code to request or in any way explicitly use locks


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux