On 5/11/15 9:47 AM, Ric Wheeler wrote: > On 05/05/2015 04:13 AM, Yujian Peng wrote: >> Emmanuel Florac <eflorac@...> writes: >> >>> Le Mon, 4 May 2015 07:00:32 +0000 (UTC) >>> Yujian Peng <pengyujian5201314 <at> 126.com> écrivait: >>> >>>> I'm encountering a data disaster. I have a ceph cluster with 145 osd. >>>> The data center had a power problem yesterday, and all of the ceph >>>> nodes were down. But now I find that 6 disks(xfs) in 4 nodes have >>>> data corruption. Some disks are unable to mount, and some disks have >>>> IO errors in syslog. mount: Structure needs cleaning >>>> xfs_log_forece: error 5 returned >>>> I tried to repair one with xfs_repair -L /dev/sdx1, but the ceph-osd >>>> reported a leveldb error: >>>> Error initializing leveldb: Corruption: checksum mismatch >>>> I cannot start the 6 osds and 22 pgs is down. >>>> This is really a tragedy for me. Can you give me some idea to >>>> recovery the xfs? Thanks very much! >>> For XFS problems, ask the XFS ML: xfs <at> oss.sgi.com >>> >>> You didn't give enough details, by far. What version of kernel and >>> distro are you running? If there were errors, please post extensive >>> logs. If you have IO errors on some disks, you probably MUST replace >>> them before going any further. >>> >>> Why did you run xfs_repair -L ? Did you try xfs_repair without options >>> first? Were you running the very very latest version of xfs_repair >>> (3.2.2) ? >>> >> The OS is ubuntu 12.04.5 with kernel 3.13.0 >> uname -a >> Linux ceph19 3.13.0-32-generic #57~precise1-Ubuntu SMP Tue Jul 15 03:51:20 >> UTC 2014 x86_64 x86_64 x86_64 GNU/Linux >> cat /etc/issue >> Ubuntu 12.04.5 LTS \n \l >> xfs_repair -V >> xfs_repair version 3.1.7 >> I've tried xfs_repair without options, but it showed me some errors, so I >> used the -L option. >> Thanks for your reply! >> > > Responding quickly to a couple of things: > > * xfs_repair -L wipes out the XFS log, not normally a good thing to do And if required due to an unreplayable log, often indicates some problem with the storage system. For example a volatile write cache not synced as needed, and lost along with a power loss, leading to a corrupted and unreplayable XFS log. > * replacing disks with IO errors is not a great idea if you still > need that data. You might want to copy the data from that disk to a > new disk (same or greater size) and then try to repair that new disk. > A lot depends on the type of IO error you see - you might have cable > issues, HBA issues, or fairly normal read issues (which are not worth > replacing a disk for). Just a note that XFS sometimes starts saying "IO error" when the filesystem has shut down; this isn't the same as a block-device-level IO error, but you haven't posted logs or anything, so I'm just guessing here. http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F -Eric > You should work with your vendor's support team if you have a support > contract or post the the XFS devel list (copied above) for help. > > Good luck! > > Ric > > > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs