On 09/19/2013 02:22 AM, Vyacheslav Dubeyko wrote: > On Wed, 2013-09-18 at 12:26 -0400, Michael L. Semon wrote: > > [snip] >>> >>> As far as I can see, your NILFS2 file system was remounted in RO mode >>> because of internal error. Could you confirm my understanding? >> >> Yes, but only on reboot. Other programs crash the PC, and NILFS2 has to >> recover from that crash. The PC spends a lot of time running xfstests and >> LTP with a kernel that is set to panic. NILFS2 itself seems OK, and its >> latest xfstests run looked good, using default mkfs.nilfs2 options and >> mounting with "-o pp=0". > > [snip] >> >> It is strictly like this so far: >> >> 1) NILFS2 / boots OK >> 2) no problems >> 3) shutdown is OK >> 4) NILFS2 / boots OK >> 5) computer crashes for some other reason >> 6) NILFS2 / boots OK, but displays a message that recovery was used >> 7) no problems >> 8) here, shutdown may hang on sync or umount (50% chance) >> >> In other words, NILFS2 has not had an error to make it remount read-only >> while the PC is running. The problem may solve itself over time, or I >> may have to boot to another partition, then mount and umount the NILFS2 >> partition to get it to recover and umount cleanly again. >> > > So, maybe it is another issue. > > [snip] >> >> I'll try your patches tonight and report back in 1-2 days. >> > > Ok. Please, inform me about the result anyway. If suggested patches > don't fix the issue then I will begin investigation. > > But, I begin to suspect presence of another issue after additional > analysis of provided by you outputs. So, I am waiting results of your > attempt. > > Thanks, > Vyacheslav Dubeyko. The issue still happens. One patch was already in the kernel, and the second patch you mentioned did not make much of a difference. The second patch is still installed, though. The problem I mentioned above is the one that is easy to explain. The crash doesn't even have to stress the computer: A simple SysRq-induced crash should be enough to get the problem started, though the PC might need to be crashed more than once. I've changed / to mount as errors=panic, but there has been no panic yet. # ================ Here is where the overall problem becomes hard to explain. Consider this scenario: / is NILFS2 (rw,order=strict) /boot is JFS /tmp is JFS /usr/src is JFS Because I don't want the hung NILFS2 umount to give problems to /tmp and /usr/src, I adapted the end of the standard Slackware shutdown script to look something like this: /bin/umount -v -a -t noproc,nosysfs,nonilfs2 # This line can be here to show a sync problem, or removed # to show a umount problem.... sync /bin/umount -v -a -t nilfs2 echo "Remounting root filesystem read-only." /bin/mount -v -n -o remount,ro /dev/sdb12 / [I can get you the exact script next time.] I choose to build a kernel, which fills memory, exercises a JFS filesystem and probably writes temp files to /tmp on JFS. `make install` installs the kernel to /boot on JFS. [BTW, `make install` can stall when /boot is within a NILFS2 / partition, but that has not been tested since I started using a separate /boot partition.] There is a much higher chance that shutdown will hang before the NILFS2 partitions are umounted. A simple `mount` placed before the `sync` shows that umount is honoring the "nonilfs2" flag, and the NILFS2 partitions are still mounted. So why would the sync *before* the umount of NILFS2 partitions get hung between segctord and sync, when mount supposedly has not umounted the NILFS2 partitions yet? This is why I mentioned the sync issue and the umount issue at the same time. Could it be that `umount ... nonilfs2` causes /etc/mtab to be modified, which is updated by NILFS2 on /, but it is not done in time to make sync (or the next `umount ... nilfs2`) happy? I'm only speculating on this idea. Thanks! Michael -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html