3 corrupted OSDs

Christian Wahl <wahl@xxxxxxxx> · Sun, 30 Jun 2019 14:47:41 +0200

Hi all,

we are running a pretty small instance of Ceph (v13.2.6) with 1 host and 8 OSDs and are planning to expand to a more default setup with 3 hosts and more OSDs.

However tonight one of our redudant PSUs died and it did failover, but it looks like this has corrupted 3 out of 8 OSDs.
The pools all have a replication level of 2.

All OSDs are BlueStore with rocksdb, no external journal or wal

2 of them report a missing rocksdb:

Jun 30 01:33:32 tecoceph systemd[1]: Starting Ceph object storage daemon osd.3...
Jun 30 01:33:32 tecoceph systemd[1]: Started Ceph object storage daemon osd.3.
Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.242 7f2666a75d80 -1 Public network was set, but cluster network was not set
Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.242 7f2666a75d80 -1     Using public network also for cluster network
Jun 30 01:33:32 tecoceph ceph-osd[11431]: starting osd.3 at - osd_data /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.898 7f2666a75d80 -1 rocksdb: NotFound:
Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.898 7f2666a75d80 -1 bluestore(/var/lib/ceph/osd/ceph-3) _open_db erroring opening db:
Jun 30 01:33:33 tecoceph ceph-osd[11431]: 2019-06-30 01:33:33.267 7f2666a75d80 -1 osd.3 0 OSD:init: unable to mount object store
Jun 30 01:33:33 tecoceph ceph-osd[11431]: 2019-06-30 01:33:33.267 7f2666a75d80 -1  ** ERROR: osd init failed: (5) Input/output error
Jun 30 01:33:33 tecoceph systemd[1]: ceph-osd@3.service: main process exited, code=exited, status=1/FAILURE

So I tried working with the bluestore-tool

[root@tecoceph osd]# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-3/
inferring bluefs devices from bluestore path
{
    "/var/lib/ceph/osd/ceph-3//block": {
        "osd_uuid": "c28c092c-00aa-4db0-9925-642bf99f0662",
        "size": 8001561821184,
        "btime": "2018-05-28 22:44:58.712336",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "a9493143-3e4e-450e-b3b8-28508d48d412",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": "AQBG************************",
        "ready": "ready",
        "whoami": "3"
    }
}

[root@tecoceph osd]# ceph-bluestore-tool fsck --deep yes --path /var/lib/ceph/osd/ceph-3/
2019-06-30 14:38:35.998 7f9947432940 -1 rocksdb: NotFound:
2019-06-30 14:38:35.998 7f9947432940 -1 bluestore(/var/lib/ceph/osd/ceph-3/) _open_db erroring opening db:
error from fsck: (5) Input/output error

Trying to access the rocksdb with the kvstore-tool fails as well
[root@tecoceph osd]# ceph-kvstore-tool rocksdb /var/lib/ceph/osd/ceph-3 list
2019-06-30 14:39:36.021 7faa747e8a80  1 rocksdb: do_open column families: []
failed to open type 2019-06-30 14:39:36.022 7faa747e8a80 -1 rocksdb: Invalid argument: /var/lib/ceph/osd/ceph-3: does not exist (create_if_missing is false)
rocksdb path /var/lib/ceph/osd/ceph-3: (22) Invalid argument

Repairing it with the kvstore-tool results in a segmentation fault…
[root@tecoceph osd]# ceph-kvstore-tool rocksdb /var/lib/ceph/osd/ceph-3 repair
*** Caught signal (Segmentation fault) **
 in thread 7ff8fde03a80 thread_name:ceph-kvstore-to
 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0xf5d0) [0x7ff8f23925d0]
 2: (main()+0x2c4) [0x55ae6dadb4e4]
 3: (__libc_start_main()+0xf5) [0x7ff8f0d673d5]
 4: (()+0x21dde0) [0x55ae6dbafde0]
2019-06-30 14:39:15.785 7ff8fde03a80 -1 *** Caught signal (Segmentation fault) **
 in thread 7ff8fde03a80 thread_name:ceph-kvstore-to

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0xf5d0) [0x7ff8f23925d0]
 2: (main()+0x2c4) [0x55ae6dadb4e4]
 3: (__libc_start_main()+0xf5) [0x7ff8f0d673d5]
 4: (()+0x21dde0) [0x55ae6dbafde0]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The other one crashes with a segfault and any tool other as well, because of a wrong magic number
Jun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:27.805 7fa9bd453d80 -1 Public network was set, but cluster network was not set
Jun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:27.805 7fa9bd453d80 -1     Using public network also for cluster network
Jun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:29.771 7fa9bd453d80 -1 abort: Corruption: Bad table magic number: expected 9863518390377041911, found 15656361161312523986 in db/002923.sst
Jun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:29.831 7fa9bd453d80 -1 *** Caught signal (Aborted) **

Is there any way to recover any of these OSDs?

Karlsruhe Institute of Technology (KIT)
Pervasive Computing Systems – TECO
Prof. Dr. Michael Beigl
IT
Christian Wahl

Vincenz-Prießnitz-Str. 1
Building 07.07., 2nd floor
76131 Karlsruhe, Germany

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com