Re: Zombie / Orphan open files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 31, 2023 at 12:12 PM Andrew J. Romero <romero@xxxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: Chuck Lever III <chuck.lever@xxxxxxxxxx>
> >
> > > On Jan 31, 2023, at 9:42 AM, Andrew J. Romero <romero@xxxxxxxx> wrote:
> > >
> > > In a large campus environment, usage of the relevant memory pool will eventually get so
> > > high that a server-side reboot will be needed.
> >
> > The above is sticking with me a bit.
> >
> > Rebooting the server should force clients to re-establish state.
> >
> > Are they not re-establishing open file state for users whose
> > ticket has expired?
>
>
> > I would think each client would re-establish
> > state for those open files anyway, and the server would be in the
> > same overcommitted state it was in before it rebooted.
>
>
> When the number of opens gets close to the limit which would result in
> a disruptive  NFSv4 service interruption ( currently 128K open files is the limit),
> I do the reboot ( actually I transfer the affected NFS serving resource
> from one NAS cluster-node to the other NAS cluster node ... this based on experience
> is like a 99.9% "non-disruptive reboot" of the affected NFS serving resource )
>
> Before the resource transfer there will be ~126K open files
> ( from the NAS perspective )
> 0.1 seconds after the resource transfer there will be
> close to zero files open. Within a few seconds there will
> be ~2000 and within a few minutes there will be ~2100.
> During the rest of the day I only see a slow rise in the average number
> of opens to maybe 2200. ( my take is ~2100 files were "active opens" before and after
>   the resource transfer ,  the rest of the 126K opens were zombies
> that the clients were no longer using ).  In 4-6 months
> the number of opens from the NAS perspective will slowly
> creep back up to the limit.

What you are describing sounds like a bug in a system (be it client or
server). There is state that the client thought it closed but the
server still keeping that state.

>
>
>
> >
> > We might not have an accurate root cause analysis yet, or I could
> > be missing something.
> >
> > --
> > Chuck Lever
> >
> >
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux