Re: Problem with query and any operation on PGs

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 30 May 2017 13:21:27 +0000 (UTC)

On Thu, 25 May 2017, Łukasz Chrustek wrote:
> Cześć,
> 
> > On Thu, 25 May 2017, Łukasz Chrustek wrote:
> >> Cześć,
> >> 
> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
> >> >> Hello,
> >> >> 
> >> >> >>
> >> >> >> > This
> >> >> >> 
> >> >> >> osd 6 - isn't startable
> >> >> 
> >> >> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
> >> >> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
> >> >> > from this osd to recover any important writes on that osd.
> >> >> 
> >> >> >> osd 10, 37, 72 are startable
> >> >> 
> >> >> > With those started, I'd repeat the original sequence and get a fresh pg
> >> >> > query to confirm that it still wants just osd.6.
> >> >> 
> >> >> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
> >> >> > ranodm osd (not one of these ones), import the pg into that osd, and start
> >> >> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
> >> >> > point.  repeat with the same basic process with the other pg.
> >> >> 
> >> >> Here is output from ceph-objectstore-tool - also didn't success:
> >> >> 
> >> >> https://pastebin.com/7XGAHdKH
> >> 
> >> > Hmm, btrfs:
> >> 
> >> > 2017-05-24 23:28:58.547456 7f500948e940 -1 
> >> > filestore(/var/lib/ceph/osd/ceph-84) ERROR: 
> >> > /var/lib/ceph/osd/ceph-84/current/nosnap exists, not rolling back to avoid
> >> > losing new data
> >> 
> >> > You could try setting --osd-use-stale-snap as suggested.
> >> 
> >> Yes... tried... and I simply get rided of 39GB data...
> 
> > What does "get rided" mean?
> 
> according to this pastebin: https://pastebin.com/QPcpkjg4
> 
> ls -R /var/lib/ceph/osd/ceph-33/current/
> 
> /var/lib/ceph/osd/ceph-33/current/:
> 
> commit_op_seq  omap
> 
> 
> 
> /var/lib/ceph/osd/ceph-33/current/omap:
> 
> 000003.log  CURRENT  LOCK  MANIFEST-000002
> 
> earlier there were same data files.

Okay, sorry I took a while to get back to you.  It looks like I gave 
you bad advice here!  The 'nosnap' files means filestore was 
operating in non-snapshotting mode, and the --osd-use-stale-snap 
warning that it would lose data was real... it rolled back to an empty 
state and threw out the data on the device.  :( :(  I'm *very* sorry about 
this!  I haven't looked at or worked with the btrfs mode is ages (we 
don't recommend it and almost nobody uses it) but I should have been 
paying close attention.

What is the state of the cluster now?

sage