Salvage CEPHFS after lost PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,


I'm looking for some suggestions on how to do something inappropriate. 


In a nutshell, I've lost the WAL/DB for three bluestore OSDs on a small cluster and, as a result of those three OSDs going offline, I've lost a placement group (7.a7). How I achieved this feat is an embarrassing mistake, which I don't think has bearing on my question.


The OSDs were created a few months ago with ceph-deploy:

/usr/local/bin/ceph-deploy --overwrite-conf osd create --bluestore --data /dev/vdc1 --block-db /dev/vdf1 ceph-a


With the 3 OSDs out, I'm sitting at OSD_BACKFILLFULL.


First, the PG 7.a7 belongs to the data pool, rather than the metadata pool and if I run "cephfs-data-scan pg_files / 7.a7" then I get a list of 4149 files/objects but then it hangs. I don't understand why this would hang if it's only the data pool which is impacted (since pg_files only operates on the metadata pool?).


The ceph-log shows:

cluster [WRN] slow request 30.894832 seconds old, received at 2019-01-20 18:00:12.555398: client_request(client.25017730:21

8006 lookup #0x10001c8ce15/000001 2019-01-20 18:00:12.550421 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting


Is the hang perhaps related to the OSD_BACKFILLFULL? If so, I could add some completely new OSDs to fix that problem. I have held off doing that for now as that will trigger a whole lot of data movement which might be unnecessary.


Or is the hang indeed related to the missing PG?


Second, if I try to copy files out of the CEPHFS filesystem, I get a few hundred files and then it too hangs. None of the files I’m attempting to copy are listed in the pg_files output (although since the pg_files hangs, perhaps it hadn't got to those files yet). Again, should I not be able to access files which are not associated with the a missing data pool PG?


Lastly, I want to know if there is some way to recreate the WAL/DB while leaving the OSD data intact and/or fool one of the OSDs into thinking everything is OK, allowing it to serve up the data it has in the missing PG.


From reading the mailing list and documentation, I know that this is not a "safe" operation:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021713.html

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024268.html


However, my current status indicates an unusable CEPHFS and limited access to the data. I'd like to get as much data off it as possible and then I expect to have to recreate it. With a combination of the backups I have and what I can salvage from the cluster, I should hopefully have most of what I need.


I know what I *should* have done, but now I'm at this point, I know I'm asking for something which would never be required on a properly-run cluster.


If it really is not possible to get the (possibly corrupt) PG back again, can I get the cluster back so the remainder of the files are accessible?


Currently running mimic 13.2.4 on all nodes.


Status:

$ ceph health detail - https://gist.github.com/kawaja/f59d231179b3186748eca19aae26bcd4

$ ceph fs get main - https://gist.github.com/kawaja/a7ab0b285d53dee6a950a4310be4fa5a


Any advice on where I could go from here would be greatly appreciated.


thanks,

rik.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux