> -----Original Message----- > From: Chuck Lever III <chuck.lever@xxxxxxxxxx> > > > On Jan 31, 2023, at 9:42 AM, Andrew J. Romero <romero@xxxxxxxx> wrote: > > > > In a large campus environment, usage of the relevant memory pool will eventually get so > > high that a server-side reboot will be needed. > > The above is sticking with me a bit. > > Rebooting the server should force clients to re-establish state. > > Are they not re-establishing open file state for users whose > ticket has expired? > I would think each client would re-establish > state for those open files anyway, and the server would be in the > same overcommitted state it was in before it rebooted. When the number of opens gets close to the limit which would result in a disruptive NFSv4 service interruption ( currently 128K open files is the limit), I do the reboot ( actually I transfer the affected NFS serving resource from one NAS cluster-node to the other NAS cluster node ... this based on experience is like a 99.9% "non-disruptive reboot" of the affected NFS serving resource ) Before the resource transfer there will be ~126K open files ( from the NAS perspective ) 0.1 seconds after the resource transfer there will be close to zero files open. Within a few seconds there will be ~2000 and within a few minutes there will be ~2100. During the rest of the day I only see a slow rise in the average number of opens to maybe 2200. ( my take is ~2100 files were "active opens" before and after the resource transfer , the rest of the 126K opens were zombies that the clients were no longer using ). In 4-6 months the number of opens from the NAS perspective will slowly creep back up to the limit. > > We might not have an accurate root cause analysis yet, or I could > be missing something. > > -- > Chuck Lever > >