On Tue, Jan 31, 2023 at 12:12 PM Andrew J. Romero <romero@xxxxxxxx> wrote: > > > > > -----Original Message----- > > From: Chuck Lever III <chuck.lever@xxxxxxxxxx> > > > > > On Jan 31, 2023, at 9:42 AM, Andrew J. Romero <romero@xxxxxxxx> wrote: > > > > > > In a large campus environment, usage of the relevant memory pool will eventually get so > > > high that a server-side reboot will be needed. > > > > The above is sticking with me a bit. > > > > Rebooting the server should force clients to re-establish state. > > > > Are they not re-establishing open file state for users whose > > ticket has expired? > > > > I would think each client would re-establish > > state for those open files anyway, and the server would be in the > > same overcommitted state it was in before it rebooted. > > > When the number of opens gets close to the limit which would result in > a disruptive NFSv4 service interruption ( currently 128K open files is the limit), > I do the reboot ( actually I transfer the affected NFS serving resource > from one NAS cluster-node to the other NAS cluster node ... this based on experience > is like a 99.9% "non-disruptive reboot" of the affected NFS serving resource ) > > Before the resource transfer there will be ~126K open files > ( from the NAS perspective ) > 0.1 seconds after the resource transfer there will be > close to zero files open. Within a few seconds there will > be ~2000 and within a few minutes there will be ~2100. > During the rest of the day I only see a slow rise in the average number > of opens to maybe 2200. ( my take is ~2100 files were "active opens" before and after > the resource transfer , the rest of the 126K opens were zombies > that the clients were no longer using ). In 4-6 months > the number of opens from the NAS perspective will slowly > creep back up to the limit. What you are describing sounds like a bug in a system (be it client or server). There is state that the client thought it closed but the server still keeping that state. > > > > > > > We might not have an accurate root cause analysis yet, or I could > > be missing something. > > > > -- > > Chuck Lever > > > > >