Re: [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 17, 2022 at 06:14:30PM +1030, Jonathan Woithe wrote:
> > >>>>> A possible culprit is 7f024fcd5c97 ("Keep read and write fds with each
> > >>>>> nlm_file"), which was introduced in or around v5.15.  You could try a
> > >>>>> simple test and back the server down to v5.14.y to see if the problem
> > >>>>> persists.
> > > 
> > > FYI I have now put the kernel.org 5.14.21 kernel on the affected system and
> > > booted it.  Since the oops has taken between 1 and 2 weeks to be triggered
> > > in the past, we may have to wait a few weeks to be certain of an outcome. 
> > > If there's anything else you need from me in the interim please ask.
> > 
> > If you identify a particular client that triggers the issue, it would be
> > helpful to know:
> > 
> > - The client's kernel version
> > - What was running on the client before it was shut down
> > - Whether the application and client shut down was clean
> 
> I have been able to identify the client involved.  It was the same client
> on both occasions.  That client is running the 4.4.14 kernel.
> :
> I will ask the user if they remember anything happening differently on the
> days of the server oops.

I have asked the user, and certainly in the case of the most recent oops the
previous day's usage (that is, the day of the unclean shutdown, the day
before the boot which triggered the server oops) was nothing out of the
ordinary.  Firefox, thunderbird and libreoffice were the only applications
used, with the desktop file browser also getting an outing.  The desktop is
xfce4.  These programs would have been used variously over the course of the
day (roughly 7.5 hours on this particular date).

> With the server running 5.14.21, I did a reset of the client (that is,
> unclean shutdown) just before I left this evening.  The server did not oops
> when the client was rebooted a minute or so later.  I will see if I can
> repeat the test with 5.15.12 tomorrow morning before others get in if you
> think that will be helpful in light of the above observations.

I did this test this morning before others came in.  The server (with
5.15.12 running) did not oops.  However, with the recent mention of locking
this may not be surprising since no NFS locking had been attempted on the
client during the test (mainly because I had no easy way to elicit a lock). 
I merely booted the client, reset it and let it boot again.

During the course of the day the client will run firefox, thunderbird and
libreoffice, all of which probably involve locking of various descriptions. 
Thus a test without locking is perhaps not perfect.

I am happy to run further tests if it will help.  Let me know if I can do
anything else.

Regards
  jonathan



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux