On Thu, 25 May 2017, Łukasz Chrustek wrote: > Cześć, > > > On Thu, 25 May 2017, Łukasz Chrustek wrote: > >> Cześć, > >> > >> > On Wed, 24 May 2017, Łukasz Chrustek wrote: > >> >> Hello, > >> >> > >> >> >> > >> >> >> > This > >> >> >> > >> >> >> osd 6 - isn't startable > >> >> > >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't > >> >> > start? ceph-objectstore-tool can be used to extract a copy of the 2 pgs > >> >> > from this osd to recover any important writes on that osd. > >> >> > >> >> >> osd 10, 37, 72 are startable > >> >> > >> >> > With those started, I'd repeat the original sequence and get a fresh pg > >> >> > query to confirm that it still wants just osd.6. > >> >> > >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other > >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start > >> >> > again. once it is up, 'ceph osd lost 6'. the pg *should* peer at that > >> >> > point. repeat with the same basic process with the other pg. > >> >> > >> >> Here is output from ceph-objectstore-tool - also didn't success: > >> >> > >> >> https://pastebin.com/7XGAHdKH > >> > >> > Hmm, btrfs: > >> > >> > 2017-05-24 23:28:58.547456 7f500948e940 -1 > >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: > >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid > >> > losing new data > >> > >> > You could try setting --osd-use-stale-snap as suggested. > >> > >> Yes... tried... and I simply get rided of 39GB data... > > > What does "get rided" mean? > > according to this pastebin: https://pastebin.com/QPcpkjg4 > > ls -R /var/lib/ceph/osd/ceph-33/current/ > > /var/lib/ceph/osd/ceph-33/current/: > > commit_op_seq omap > > > > /var/lib/ceph/osd/ceph-33/current/omap: > > 000003.log CURRENT LOCK MANIFEST-000002 > > earlier there were same data files. Okay, sorry I took a while to get back to you. It looks like I gave you bad advice here! The 'nosnap' files means filestore was operating in non-snapshotting mode, and the --osd-use-stale-snap warning that it would lose data was real... it rolled back to an empty state and threw out the data on the device. :( :( I'm *very* sorry about this! I haven't looked at or worked with the btrfs mode is ages (we don't recommend it and almost nobody uses it) but I should have been paying close attention. What is the state of the cluster now? sage