Re: EFSCORRUPTED on mount?

Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> · Tue, 22 Nov 2011 10:47:24 -0800

On Mon, Nov 21, 2011 at 5:41 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> In other words, your admin basically told the system to shutdown
> without syncing the data or running shutdown scripts that sync data.
> i.e. it forces an immediate reboot while the system is still active,
> causing an unclean shutdown and guaranteed data loss.
And he's been yelled at appropriately. ;) But the data loss actually
isn't a problem for us here as long as the filesystem isn't corrupted.

>> But I've been assured this shouldn't have been able to
>> corrupt the filesystem, so troubleshooting continues.
>
> That depends entirely on your hardware. Are you running with
> barriers enabled?  If you don't have barriers active, then metadata
> corruption is entirely possible in this scenarion, especially if the
> hardware does a drive reset or power cycle during the reboot
> procedure. Even with barriers, there are RAID controllers that
> enable back end drive caches and they fail to get flushed and hence
> can cause corruption on unclean shutdowns.
Barriers on (at least, nobody turned them off); the RAID card is
battery-backed; here are megacli dumps:
http://pastebin.com/yTskgzWG
http://pastebin.com/ekhczycy

Sorry if I seem to eager to assume it's an xfs bug but Ceph is a magic
machine for taking stable filesystems and making them cry. :/

On Tue, Nov 22, 2011 at 7:06 AM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
> Others have had good comments but also:
>
>> 2011-11-17 16:00:37.294876 7f83f3eef720 filestore(/mnt/osd.17)
>> truncate meta/pginfo_12.7c8/0 size 0
>> 2011-11-17 16:00:37.483407 7f83f3eef720 filestore(/mnt/osd.17)
>> truncate meta/pginfo_12.7c8/0 size 0 = -117
>> 2011-11-17 16:00:37.483476 7f83f3eef720 filestore(/mnt/osd.17)  error
>> error 117: Structure needs cleaning not handled
>
> was there anything in dmesg/system logs right at this point?  XFS should
> have said something about this original error.
Whoops. The following is a sample of what was in dmesg and kern.log
after that point but before I did anything else (it repeated a lot but
there weren't any other lines of output):
xfs/xfs_buf.c.  Return address = 0xffffffff811c2aa8
[56459.526220] XFS (sdg1): xfs_log_force: error 5 returned.
[56489.544153] XFS (sdg1): xfs_log_force: error 5 returned.
[56519.562087] XFS (sdg1): xfs_log_force: error 5 returned.
[56549.580021] XFS (sdg1): xfs_log_force: error 5 returned.
[56579.597956] XFS (sdg1): xfs_log_force: error 5 returned.
[56609.615889] XFS (sdg1): xfs_log_force: error 5 returned.
[56613.036430] XFS (sdg1): xfs_log_force: error 5 returned.
[56613.041731] XFS (sdg1): xfs_do_force_shutdown(0x1) called from line
1037 of file fs/xfs/xfs_buf.c.  Return address = 0xffffffff811c2aa8
[56619.430497] XFS (sdg1): xfs_log_force: error 5 returned.
[56619.435796] XFS (sdg1): xfs_do_force_shutdown(0x1) called from line
1037 of file fs/xfs/xfs_buf.c.  Return address = 0xffffffff811c2aa8

Thanks!
-Greg

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs