Re: Daily crash in xfs_cmn_err

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 29, 2012 at 11:55:15AM +0100, Juerg Haefliger wrote:
> Hi,
> 
> I have a node that used to crash every day at 6:25am in xfs_cmn_err
> (Null pointer dereference).

Stack trace, please.

> 1) I was under the impression that during the mounting of an XFS
> volume some sort of check/repair is performed.  How does that differ
> from running xfs_check and/or xfs_repair?

Journal recovery is performed at mount time, not a consistency
check.

http://en.wikipedia.org/wiki/Filesystem_journaling

> 2) Any ideas how the filesystem might have gotten into this state? I
> don't have the history of that node but it's possible that it crashed
> previously due to an unrelated problem. Could this have left the
> filesystem is this state?

<shrug>

How long is a piece of string?

> 3) What exactly does the ouput of the xfs_check mean? How serious is
> it? Are those warning or errors? Will some of them get cleanup up
> during the mounting of the filesystem?

xfs_check is deprecated.  The output of xfs_repair indicates
cross-linked extent indexes. Will only get properly detected and
fixed by xfs_repair. And "fixed" may mean corrupt files are removed
from the filesystem - repair does nto guarantee that your data is
preserved or consistent after it runs, just that the filesystem is
consistent and error free.

> 4) We have a whole bunch of production nodes running the same kernel.
> I'm more than a little concerned that we might have a ticking timebomb
> with some filesystems being in a state that might trigger a crash
> eventually. Is there any way to perform a live check on a mounted
> filesystem so that I can get an idea of how big of a problem we have
> (if any)?

Read the xfs_repair man page?

-n     No modify mode. Specifies that xfs_repair should not
       modify the filesystem but should only scan the  filesystem
       and indicate what repairs would have been made.
.....

-d     Repair dangerously. Allow xfs_repair to repair an XFS
       filesystem mounted read only. This is typically done on a
       root fileystem from single user mode, immediately followed by
       a reboot.

So, remount read only, run xfs_repair -d -n will check the
filesystem as best as can be done online. If there are any problems,
then you can repair them and immediately reboot.

> i don't claim to know exactly what I'm doing but I picked a
> node, froze the filesystem and then ran a modified xfs_check (which
> bypasses the is_mounted check and ignores non-committed metadata) and
> it did report some issues. At this point I believe those are false
> positive. Do you have any suggestions short of rebooting the nodes and
> running xfs_check on the unmounted filesystem?

Don't bother with xfs_check. xfs_repair will detect all the same
errors (and more) and can fix them at the same time.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs


[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux