Thanks Marc,
When I next have physical access to the cluster, I’ll add some more OSDs. Would that cause the hanging though?
No takers on the bluestore salvage? If you have a backfillfull, no pg's will be able to migrate. Better is to just add harddrives, because at least one of your osd's is to full.I know you can set the backfillfull ratio's with commands like theseceph tell osd.* injectargs '--mon_osd_full_ratio=0.970000'ceph tell osd.* injectargs '--mon_osd_backfillfull_ratio=0.950000'ceph tell osd.* injectargs '--mon_osd_full_ratio=0.950000'ceph tell osd.* injectargs '--mon_osd_backfillfull_ratio=0.900000'Or maybe decrease the weight of the full osd, check the osds with 'ceph osd status' and make sure your nodes have even distribution of the storage.-----Original Message-----From: Rik [mailto:rik@xxxxxxxxxx] Sent: zondag 20 januari 2019 8:47To: ceph-users@xxxxxxxxxxxxxxSubject: Salvage CEPHFS after lost PGHi all,I'm looking for some suggestions on how to do something inappropriate. In a nutshell, I've lost the WAL/DB for three bluestore OSDs on a small cluster and, as a result of those three OSDs going offline, I've lost a placement group (7.a7). How I achieved this feat is an embarrassing mistake, which I don't think has bearing on my question.The OSDs were created a few months ago with ceph-deploy:/usr/local/bin/ceph-deploy --overwrite-conf osd create --bluestore --data /dev/vdc1 --block-db /dev/vdf1 ceph-aWith the 3 OSDs out, I'm sitting at OSD_BACKFILLFULL.First, the PG 7.a7 belongs to the data pool, rather than the metadata pool and if I run "cephfs-data-scan pg_files / 7.a7" then I get a list of 4149 files/objects but then it hangs. I don't understand why this would hang if it's only the data pool which is impacted (since pg_files only operates on the metadata pool?).The ceph-log shows:cluster [WRN] slow request 30.894832 seconds old, received at 2019-01-20 18:00:12.555398: client_request(client.25017730:218006 lookup #0x10001c8ce15/000001 2019-01-20 18:00:12.550421 caller_uid=0, caller_gid=0{}) currently failed to rdlock, waitingIs the hang perhaps related to the OSD_BACKFILLFULL? If so, I could add some completely new OSDs to fix that problem. I have held off doing that for now as that will trigger a whole lot of data movement which might be unnecessary.Or is the hang indeed related to the missing PG?Second, if I try to copy files out of the CEPHFS filesystem, I get a few hundred files and then it too hangs. None of the files I’m attempting to copy are listed in the pg_files output (although since the pg_files hangs, perhaps it hadn't got to those files yet). Again, should I not be able to access files which are not associated with the a missing data pool PG?Lastly, I want to know if there is some way to recreate the WAL/DB while leaving the OSD data intact and/or fool one of the OSDs into thinking everything is OK, allowing it to serve up the data it has in the missing PG.From reading the mailing list and documentation, I know that this is not a "safe" operation:http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021713.htmlhttp://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024268.htmlHowever, my current status indicates an unusable CEPHFS and limited access to the data. I'd like to get as much data off it as possible and then I expect to have to recreate it. With a combination of the backups I have and what I can salvage from the cluster, I should hopefully have most of what I need.I know what I *should* have done, but now I'm at this point, I know I'm asking for something which would never be required on a properly-run cluster.If it really is not possible to get the (possibly corrupt) PG back again, can I get the cluster back so the remainder of the files are accessible?Currently running mimic 13.2.4 on all nodes.Status:$ ceph health detail - https://gist.github.com/kawaja/f59d231179b3186748eca19aae26bcd4$ ceph fs get main - https://gist.github.com/kawaja/a7ab0b285d53dee6a950a4310be4fa5aAny advice on where I could go from here would be greatly appreciated.thanks,rik.
|