Just another quick question, Do you know if you RAID Controller is disabling the local disk write caches? I'm wondering how this corruption occurred and if this is a problem that is specific to your hardware/software config or is a general Ceph issue that makes it vulnerable to sudden power loss. Normally write barriers should protect against this sort of thing, but a hardware Raid controller may not be passing a flush all the way down to the disks properly. Nick > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Nick Fisk > Sent: 05 May 2015 07:46 > To: 'Yujian Peng'; ceph-users@xxxxxxxxxxxxxx > Subject: Re: xfs corruption, data disaster! > > This is probably similar to what you want to try and do, but also mark those > failed OSD's as lost as I don't think you will have much luck getting them back > up and running. > > http://ceph.com/community/incomplete-pgs-oh-my/#more-6845 > > The only other option would be if anyone knows a way to rebuild the levelDB > by indexing the contents of the filestore, but I would suspect it would do > something similar as well. > > But please get a second opinion before doing anything > > > > -----Original Message----- > > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf > > Of Yujian Peng > > Sent: 05 May 2015 02:14 > > To: ceph-users@xxxxxxxxxxxxxx > > Subject: Re: xfs corruption, data disaster! > > > > Emmanuel Florac <eflorac@...> writes: > > > > > > > > Le Mon, 4 May 2015 07:00:32 +0000 (UTC) Yujian Peng > > > <pengyujian5201314 <at> 126.com> écrivait: > > > > > > > I'm encountering a data disaster. I have a ceph cluster with 145 osd. > > > > The data center had a power problem yesterday, and all of the ceph > > > > nodes were down. But now I find that 6 disks(xfs) in 4 nodes have > > > > data corruption. Some disks are unable to mount, and some disks > > > > have IO errors in syslog. mount: Structure needs cleaning > > > > xfs_log_forece: error 5 returned > > > > I tried to repair one with xfs_repair -L /dev/sdx1, but the > > > > ceph-osd reported a leveldb error: > > > > Error initializing leveldb: Corruption: checksum mismatch I > > > > cannot start the 6 osds and 22 pgs is down. > > > > This is really a tragedy for me. Can you give me some idea to > > > > recovery the xfs? Thanks very much! > > > > > > For XFS problems, ask the XFS ML: xfs <at> oss.sgi.com > > > > > > You didn't give enough details, by far. What version of kernel and > > > distro are you running? If there were errors, please post extensive > > > logs. If you have IO errors on some disks, you probably MUST replace > > > them before going any further. > > > > > > Why did you run xfs_repair -L ? Did you try xfs_repair without > > > options first? Were you running the very very latest version of > > > xfs_repair > > > (3.2.2) ? > > > > > The OS is ubuntu 12.04.5 with kernel 3.13.0 uname -a Linux ceph19 > > 3.13.0-32- generic #57~precise1-Ubuntu SMP Tue Jul 15 03:51:20 UTC > > 2014 x86_64 > > x86_64 x86_64 GNU/Linux cat /etc/issue Ubuntu 12.04.5 LTS \n \l > > xfs_repair - V xfs_repair version 3.1.7 I've tried xfs_repair without > > options, but it showed me some errors, so I used the -L option. > > Thanks for your reply! > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com