Re: Salvage CEPHFS after lost PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Marc,

When I next have physical access to the cluster, I’ll add some more OSDs. Would that cause the hanging though?

No takers on the bluestore salvage?

thanks,
rik.

On 20 Jan 2019, at 20:36, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:


If you have a backfillfull, no pg's will be able to migrate.
Better is to just add harddrives, because at least one of your osd's is
to full.

I know you can set the backfillfull ratio's with commands like these
ceph tell osd.* injectargs '--mon_osd_full_ratio=0.970000'
ceph tell osd.* injectargs '--mon_osd_backfillfull_ratio=0.950000'

ceph tell osd.* injectargs '--mon_osd_full_ratio=0.950000'
ceph tell osd.* injectargs '--mon_osd_backfillfull_ratio=0.900000'

Or maybe decrease the weight of the full osd, check the osds with 'ceph
osd status' and make sure your nodes have even distribution of the
storage.











-----Original Message-----
From: Rik [mailto:rik@xxxxxxxxxx]
Sent: zondag 20 januari 2019 8:47
To: ceph-users@xxxxxxxxxxxxxx
Subject: Salvage CEPHFS after lost PG

Hi all,




I'm looking for some suggestions on how to do something inappropriate.




In a nutshell, I've lost the WAL/DB for three bluestore OSDs on a small
cluster and, as a result of those three OSDs going offline, I've lost a
placement group (7.a7). How I achieved this feat is an embarrassing
mistake, which I don't think has bearing on my question.




The OSDs were created a few months ago with ceph-deploy:

/usr/local/bin/ceph-deploy --overwrite-conf osd create --bluestore
--data /dev/vdc1 --block-db /dev/vdf1 ceph-a




With the 3 OSDs out, I'm sitting at OSD_BACKFILLFULL.




First, the PG 7.a7 belongs to the data pool, rather than the metadata
pool and if I run "cephfs-data-scan pg_files / 7.a7" then I get a list
of 4149 files/objects but then it hangs. I don't understand why this
would hang if it's only the data pool which is impacted (since pg_files
only operates on the metadata pool?).




The ceph-log shows:

cluster [WRN] slow request 30.894832 seconds old, received at 2019-01-20
18:00:12.555398: client_request(client.25017730:21

8006 lookup #0x10001c8ce15/000001 2019-01-20 18:00:12.550421
caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting




Is the hang perhaps related to the OSD_BACKFILLFULL? If so, I could add
some completely new OSDs to fix that problem. I have held off doing that
for now as that will trigger a whole lot of data movement which might be
unnecessary.




Or is the hang indeed related to the missing PG?




Second, if I try to copy files out of the CEPHFS filesystem, I get a few
hundred files and then it too hangs. None of the files I’m attempting
to copy are listed in the pg_files output (although since the pg_files
hangs, perhaps it hadn't got to those files yet). Again, should I not be
able to access files which are not associated with the a missing data
pool PG?




Lastly, I want to know if there is some way to recreate the WAL/DB while
leaving the OSD data intact and/or fool one of the OSDs into thinking
everything is OK, allowing it to serve up the data it has in the missing
PG.




From reading the mailing list and documentation, I know that this is not
a "safe" operation:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021713.html

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024268.html




However, my current status indicates an unusable CEPHFS and limited
access to the data. I'd like to get as much data off it as possible and
then I expect to have to recreate it. With a combination of the backups
I have and what I can salvage from the cluster, I should hopefully have
most of what I need.




I know what I *should* have done, but now I'm at this point, I know I'm
asking for something which would never be required on a properly-run
cluster.




If it really is not possible to get the (possibly corrupt) PG back
again, can I get the cluster back so the remainder of the files are
accessible?




Currently running mimic 13.2.4 on all nodes.




Status:

$ ceph health detail -
https://gist.github.com/kawaja/f59d231179b3186748eca19aae26bcd4

$ ceph fs get main -
https://gist.github.com/kawaja/a7ab0b285d53dee6a950a4310be4fa5a




Any advice on where I could go from here would be greatly appreciated.




thanks,

rik.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux