On Mon, Nov 21, 2011 at 5:41 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > In other words, your admin basically told the system to shutdown > without syncing the data or running shutdown scripts that sync data. > i.e. it forces an immediate reboot while the system is still active, > causing an unclean shutdown and guaranteed data loss. And he's been yelled at appropriately. ;) But the data loss actually isn't a problem for us here as long as the filesystem isn't corrupted. >> But I've been assured this shouldn't have been able to >> corrupt the filesystem, so troubleshooting continues. > > That depends entirely on your hardware. Are you running with > barriers enabled? If you don't have barriers active, then metadata > corruption is entirely possible in this scenarion, especially if the > hardware does a drive reset or power cycle during the reboot > procedure. Even with barriers, there are RAID controllers that > enable back end drive caches and they fail to get flushed and hence > can cause corruption on unclean shutdowns. Barriers on (at least, nobody turned them off); the RAID card is battery-backed; here are megacli dumps: http://pastebin.com/yTskgzWG http://pastebin.com/ekhczycy Sorry if I seem to eager to assume it's an xfs bug but Ceph is a magic machine for taking stable filesystems and making them cry. :/ On Tue, Nov 22, 2011 at 7:06 AM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote: > Others have had good comments but also: > >> 2011-11-17 16:00:37.294876 7f83f3eef720 filestore(/mnt/osd.17) >> truncate meta/pginfo_12.7c8/0 size 0 >> 2011-11-17 16:00:37.483407 7f83f3eef720 filestore(/mnt/osd.17) >> truncate meta/pginfo_12.7c8/0 size 0 = -117 >> 2011-11-17 16:00:37.483476 7f83f3eef720 filestore(/mnt/osd.17) error >> error 117: Structure needs cleaning not handled > > was there anything in dmesg/system logs right at this point? XFS should > have said something about this original error. Whoops. The following is a sample of what was in dmesg and kern.log after that point but before I did anything else (it repeated a lot but there weren't any other lines of output): xfs/xfs_buf.c. Return address = 0xffffffff811c2aa8 [56459.526220] XFS (sdg1): xfs_log_force: error 5 returned. [56489.544153] XFS (sdg1): xfs_log_force: error 5 returned. [56519.562087] XFS (sdg1): xfs_log_force: error 5 returned. [56549.580021] XFS (sdg1): xfs_log_force: error 5 returned. [56579.597956] XFS (sdg1): xfs_log_force: error 5 returned. [56609.615889] XFS (sdg1): xfs_log_force: error 5 returned. [56613.036430] XFS (sdg1): xfs_log_force: error 5 returned. [56613.041731] XFS (sdg1): xfs_do_force_shutdown(0x1) called from line 1037 of file fs/xfs/xfs_buf.c. Return address = 0xffffffff811c2aa8 [56619.430497] XFS (sdg1): xfs_log_force: error 5 returned. [56619.435796] XFS (sdg1): xfs_do_force_shutdown(0x1) called from line 1037 of file fs/xfs/xfs_buf.c. Return address = 0xffffffff811c2aa8 Thanks! -Greg _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs