On Mon, Oct 29, 2012 at 11:55:15AM +0100, Juerg Haefliger wrote: > Hi, > > I have a node that used to crash every day at 6:25am in xfs_cmn_err > (Null pointer dereference). Stack trace, please. > 1) I was under the impression that during the mounting of an XFS > volume some sort of check/repair is performed. How does that differ > from running xfs_check and/or xfs_repair? Journal recovery is performed at mount time, not a consistency check. http://en.wikipedia.org/wiki/Filesystem_journaling > 2) Any ideas how the filesystem might have gotten into this state? I > don't have the history of that node but it's possible that it crashed > previously due to an unrelated problem. Could this have left the > filesystem is this state? <shrug> How long is a piece of string? > 3) What exactly does the ouput of the xfs_check mean? How serious is > it? Are those warning or errors? Will some of them get cleanup up > during the mounting of the filesystem? xfs_check is deprecated. The output of xfs_repair indicates cross-linked extent indexes. Will only get properly detected and fixed by xfs_repair. And "fixed" may mean corrupt files are removed from the filesystem - repair does nto guarantee that your data is preserved or consistent after it runs, just that the filesystem is consistent and error free. > 4) We have a whole bunch of production nodes running the same kernel. > I'm more than a little concerned that we might have a ticking timebomb > with some filesystems being in a state that might trigger a crash > eventually. Is there any way to perform a live check on a mounted > filesystem so that I can get an idea of how big of a problem we have > (if any)? Read the xfs_repair man page? -n No modify mode. Specifies that xfs_repair should not modify the filesystem but should only scan the filesystem and indicate what repairs would have been made. ..... -d Repair dangerously. Allow xfs_repair to repair an XFS filesystem mounted read only. This is typically done on a root fileystem from single user mode, immediately followed by a reboot. So, remount read only, run xfs_repair -d -n will check the filesystem as best as can be done online. If there are any problems, then you can repair them and immediately reboot. > i don't claim to know exactly what I'm doing but I picked a > node, froze the filesystem and then ran a modified xfs_check (which > bypasses the is_mounted check and ignores non-committed metadata) and > it did report some issues. At this point I believe those are false > positive. Do you have any suggestions short of rebooting the nodes and > running xfs_check on the unmounted filesystem? Don't bother with xfs_check. xfs_repair will detect all the same errors (and more) and can fix them at the same time. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs