Re: rocksdb corruption, stale pg, rebuild bucket index

Harald Staub <harald.staub@xxxxxxxxx> · Thu, 13 Jun 2019 16:09:19 +0200

On 13.06.19 15:52, Sage Weil wrote:
On Thu, 13 Jun 2019, Harald Staub wrote:
[...]
I think that increasing the various suicide timeout options will allow
it to stay up long enough to clean up the ginormous objects:

  ceph config set osd.NNN osd_op_thread_suicide_timeout 2h

ok

It looks healthy so far:
ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-266 fsck
fsck success

Now we have to choose how to continue, trying to reduce the risk of losing
data (most bucket indexes are intact currently). My guess would be to let this
OSD (which was not the primary) go in and hope that it recovers. In case of a
problem, maybe we could still use the other OSDs "somehow"? In case of
success, we would bring back the other OSDs as well?

OTOH we could try to continue with the key dump from earlier today.

I would start all three osds the same way, with 'noout' set on the
cluster.  You should try to avoid triggering recovery because it will have
a hard time getting through the big index object on that bucket (i.e., it
will take a long time, and might trigger some blocked ios and so forth).

This I do not understand, how would I avoid recovery?

(Side note that since you started the OSD read-write using the internal
copy of rocksdb, don't forget that the external copy you extracted
(/mnt/ceph/db?) is now stale!)

As suggested by Paul Emmerich (see next E-mail in this thread), I 
exported this PG. It took not that long (20 minutes).

Thank you!
 Harry
[...]
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com