On Mon, Jan 17, 2022 at 06:14:30PM +1030, Jonathan Woithe wrote: > > >>>>> A possible culprit is 7f024fcd5c97 ("Keep read and write fds with each > > >>>>> nlm_file"), which was introduced in or around v5.15. You could try a > > >>>>> simple test and back the server down to v5.14.y to see if the problem > > >>>>> persists. > > > > > > FYI I have now put the kernel.org 5.14.21 kernel on the affected system and > > > booted it. Since the oops has taken between 1 and 2 weeks to be triggered > > > in the past, we may have to wait a few weeks to be certain of an outcome. > > > If there's anything else you need from me in the interim please ask. > > > > If you identify a particular client that triggers the issue, it would be > > helpful to know: > > > > - The client's kernel version > > - What was running on the client before it was shut down > > - Whether the application and client shut down was clean > > I have been able to identify the client involved. It was the same client > on both occasions. That client is running the 4.4.14 kernel. > : > I will ask the user if they remember anything happening differently on the > days of the server oops. I have asked the user, and certainly in the case of the most recent oops the previous day's usage (that is, the day of the unclean shutdown, the day before the boot which triggered the server oops) was nothing out of the ordinary. Firefox, thunderbird and libreoffice were the only applications used, with the desktop file browser also getting an outing. The desktop is xfce4. These programs would have been used variously over the course of the day (roughly 7.5 hours on this particular date). > With the server running 5.14.21, I did a reset of the client (that is, > unclean shutdown) just before I left this evening. The server did not oops > when the client was rebooted a minute or so later. I will see if I can > repeat the test with 5.15.12 tomorrow morning before others get in if you > think that will be helpful in light of the above observations. I did this test this morning before others came in. The server (with 5.15.12 running) did not oops. However, with the recent mention of locking this may not be surprising since no NFS locking had been attempted on the client during the test (mainly because I had no easy way to elicit a lock). I merely booted the client, reset it and let it boot again. During the course of the day the client will run firefox, thunderbird and libreoffice, all of which probably involve locking of various descriptions. Thus a test without locking is perhaps not perfect. I am happy to run further tests if it will help. Let me know if I can do anything else. Regards jonathan