On Tue, 2013-12-10 at 12:14 -0500, Michael Conrad wrote: > I think this will help. However, the idea I had in mind originally was > for nilfs to "give up" sooner. > > I suspect that my nilfs partition became corrupt for hardware or > hardware-driver reasons. So lets ignore that part for now. > > With the data on the drive being corrupt, it appeared that nilfs > encountered an invalid directory (possibly just a long string of NUL > bytes?) and emitted more than a million errors about invalid structures, > triggering the soft-lockup watchdog and rebooting the system. When I > recompiled my kernel with soft-lockup set to 5 minutes, it simply filled > my log files. > > [10796.519283] NILFS error (device sdf1): nilfs_check_page: bad entry in > directory #2383620: rec_len is smaller than minimal - offset=1143304192, > inode=0, rec_len=0, name_len=0 > > I haven't read the code involved, but what I think should happen is that > on the very *first* error, it should return an I/O error to userland. > Also, the partition was set to "errors=remount-ro", so the very first > error should also make the filesystem read-only, correct? > I think that your vision is correct. I am pondering about likewise solution too. But, anyway, likewise fixes can be made locally only for every concrete use-case. To change error messages correctly is easy. But to suggest proper fix without an issue reproducing is impossible. Anyway, I'll try to elaborate some fix. But my hands on another task right now. Thanks, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html