Re: Recovering PGs from Dead OSD disk

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 05 Jun 2017 18:28:12 +0000

On Sat, Jun 3, 2017 at 6:17 AM James Horner <humankind135@xxxxxxxxx> wrote:
Hi All

Thanks in advance for any help, I was wondering if anyone can help me with a pickle I have gotten myself into!

I was in the process of adding OSD's to my small cluster (6 OSDs) and the disk died halfway through, unfort I had left the defaults from when I was bootstrapping the cluster in place which meant that size=1. I have managed to mount the old OSD disk and copy off all the files in the root and some of the files under current but when I try and recover the PG's using ceph-objectstore-tool I get the following:

ceph-objectstore-tool --op export --pgid 7.cb --data-path /mnt/temp --journal-path /mnt/temp/journal --skip-journal-replay --debug --file 7.cb.export
2017-06-03 14:03:39.906546 7f2b217cc940  0 filestore(/mnt/temp) backend xfs (magic 0x58465342)
2017-06-03 14:03:39.907153 7f2b217cc940  0 genericfilestorebackend(/mnt/temp) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-06-03 14:03:39.907162 7f2b217cc940  0 genericfilestorebackend(/mnt/temp) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-06-03 14:03:39.907193 7f2b217cc940  0 genericfilestorebackend(/mnt/temp) detect_features: splice is supported
2017-06-03 14:03:39.999729 7f2b217cc940  0 genericfilestorebackend(/mnt/temp) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2017-06-03 14:03:39.999865 7f2b217cc940  0 xfsfilestorebackend(/mnt/temp) detect_feature: extsize is disabled by conf
2017-06-03 14:03:40.000392 7f2b217cc940 -1 filestore(/mnt/temp) mount initial op seq is 0; something is wrong
Mount failed with '(22) Invalid argument'

I know there is data in pg 7.cb as:
$ du -sh  /mnt/temp/current/7.cb*
15G    /mnt/temp/current/7.cb_head
0    /mnt/temp/current/7.cb_TEMP

I am unsure why it is trying to mount anything as the disk is already mounted. Is one of the disabled config options stopping me from completing this or am I missing something else? This pool was used for CephFS data (pool 6 is the metadata and in a similar state).  I have only copied the pg folders for the relevant pools, there are others on there but I think they might be stored on the damaged section of the disk.

I suspect you didn't do a complete copy; there are a number of important xattrs on the files and you're not going to get anywhere if you don't have the leveldb files. skip-journal-replay may not be doing any favors either but I think it would depend on your disk state.

Could you not run the ceph-objectstore-tool against your real disk? That'll be your best bet; otherwise you're probably out of luck. Best option after that would be to try and reconstruct the cephfs files by pulling them out of the raw disk objects, which may be aided by running the cephfs repair/recovery tools so that you at least get access to everything which is still in a working OSD. http://docs.ceph.com/docs/master/cephfs/disaster-recovery/
-Greg

Thanks for any help

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com