Hi,
rocksdb in BlueStore should be opened like this with ceph-kvstore-tool:
ceph-kvstore-tool bluestore-kv
Instead of just "rocksdb" which is for rocksdb on some file system.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Sun, Jun 30, 2019 at 2:49 PM Christian Wahl <wahl@xxxxxxxx> wrote:
_______________________________________________Hi all,we are running a pretty small instance of Ceph (v13.2.6) with 1 host and 8 OSDs and are planning to expand to a more default setup with 3 hosts and more OSDs.However tonight one of our redudant PSUs died and it did failover, but it looks like this has corrupted 3 out of 8 OSDs.The pools all have a replication level of 2.All OSDs are BlueStore with rocksdb, no external journal or wal2 of them report a missing rocksdb:Jun 30 01:33:32 tecoceph systemd[1]: Starting Ceph object storage daemon osd.3...Jun 30 01:33:32 tecoceph systemd[1]: Started Ceph object storage daemon osd.3.Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.242 7f2666a75d80 -1 Public network was set, but cluster network was not setJun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.242 7f2666a75d80 -1 Using public network also for cluster networkJun 30 01:33:32 tecoceph ceph-osd[11431]: starting osd.3 at - osd_data /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journalJun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.898 7f2666a75d80 -1 rocksdb: NotFound:Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.898 7f2666a75d80 -1 bluestore(/var/lib/ceph/osd/ceph-3) _open_db erroring opening db:Jun 30 01:33:33 tecoceph ceph-osd[11431]: 2019-06-30 01:33:33.267 7f2666a75d80 -1 osd.3 0 OSD:init: unable to mount object storeJun 30 01:33:33 tecoceph ceph-osd[11431]: 2019-06-30 01:33:33.267 7f2666a75d80 -1 ** ERROR: osd init failed: (5) Input/output errorJun 30 01:33:33 tecoceph systemd[1]: ceph-osd@3.service: main process exited, code=exited, status=1/FAILURESo I tried working with the bluestore-tool[root@tecoceph osd]# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-3/inferring bluefs devices from bluestore path{"/var/lib/ceph/osd/ceph-3//block": {"osd_uuid": "c28c092c-00aa-4db0-9925-642bf99f0662","size": 8001561821184,"btime": "2018-05-28 22:44:58.712336","description": "main","bluefs": "1","ceph_fsid": "a9493143-3e4e-450e-b3b8-28508d48d412","kv_backend": "rocksdb","magic": "ceph osd volume v026","mkfs_done": "yes","osd_key": "AQBG************************","ready": "ready","whoami": "3"}}[root@tecoceph osd]# ceph-bluestore-tool fsck --deep yes --path /var/lib/ceph/osd/ceph-3/2019-06-30 14:38:35.998 7f9947432940 -1 rocksdb: NotFound:2019-06-30 14:38:35.998 7f9947432940 -1 bluestore(/var/lib/ceph/osd/ceph-3/) _open_db erroring opening db:error from fsck: (5) Input/output errorTrying to access the rocksdb with the kvstore-tool fails as well[root@tecoceph osd]# ceph-kvstore-tool rocksdb /var/lib/ceph/osd/ceph-3 list2019-06-30 14:39:36.021 7faa747e8a80 1 rocksdb: do_open column families: []failed to open type 2019-06-30 14:39:36.022 7faa747e8a80 -1 rocksdb: Invalid argument: /var/lib/ceph/osd/ceph-3: does not exist (create_if_missing is false)rocksdb path /var/lib/ceph/osd/ceph-3: (22) Invalid argumentRepairing it with the kvstore-tool results in a segmentation fault…[root@tecoceph osd]# ceph-kvstore-tool rocksdb /var/lib/ceph/osd/ceph-3 repair*** Caught signal (Segmentation fault) **in thread 7ff8fde03a80 thread_name:ceph-kvstore-toceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)1: (()+0xf5d0) [0x7ff8f23925d0]2: (main()+0x2c4) [0x55ae6dadb4e4]3: (__libc_start_main()+0xf5) [0x7ff8f0d673d5]4: (()+0x21dde0) [0x55ae6dbafde0]2019-06-30 14:39:15.785 7ff8fde03a80 -1 *** Caught signal (Segmentation fault) **in thread 7ff8fde03a80 thread_name:ceph-kvstore-toceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)1: (()+0xf5d0) [0x7ff8f23925d0]2: (main()+0x2c4) [0x55ae6dadb4e4]3: (__libc_start_main()+0xf5) [0x7ff8f0d673d5]4: (()+0x21dde0) [0x55ae6dbafde0]NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.The other one crashes with a segfault and any tool other as well, because of a wrong magic numberJun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:27.805 7fa9bd453d80 -1 Public network was set, but cluster network was not setJun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:27.805 7fa9bd453d80 -1 Using public network also for cluster networkJun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:29.771 7fa9bd453d80 -1 abort: Corruption: Bad table magic number: expected 9863518390377041911, found 15656361161312523986 in db/002923.sstJun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:29.831 7fa9bd453d80 -1 *** Caught signal (Aborted) **Is there any way to recover any of these OSDs?Karlsruhe Institute of Technology (KIT)
Pervasive Computing Systems – TECO
Prof. Dr. Michael Beigl
IT
Christian Wahl
Vincenz-Prießnitz-Str. 1
Building 07.07., 2nd floor
76131 Karlsruhe, Germany
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com