Hey,
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool, bool)' thread 7f21c3c6d980 time 2019-07-25 23:19:44.820537
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: 848: FAILED assert(r != q->second->file_map.end())
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7f21c56c4b5e]
2: (()+0x2c4cb7) [0x7f21c56c4cb7]
3: (BlueFS::_replay(bool, bool)+0x4082) [0x56432ef954a2]
4: (BlueFS::mount()+0xff) [0x56432ef958ef]
5: (BlueStore::_open_db(bool, bool)+0x81c) [0x56432eff1a1c]
6: (BlueStore::_fsck(bool, bool)+0x337) [0x56432f00e0a7]
7: (main()+0xf0a) [0x56432eea7dca]
8: (__libc_start_main()+0xeb) [0x7f21c4b7109b]
9: (_start()+0x2a) [0x56432ef700fa]
*** Caught signal (Aborted) **
in thread 7f21c3c6d980 thread_name:ceph-bluestore-
2019-07-25 23:19:44.817 7f21c3c6d980 -1 /build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool, bool)' thread 7f21c3c6d980 time 2019-07-25 23:19:44.820537
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: 848: FAILED assert(r != q->second->file_map.end())
I have a machine with 5 drives in a VM and 5 drives that were on the same host machine. I've made this mistake once before ceph-volume activate -all the host machines drives and it takes over the 5 drives in the VM as well and corrupts them.
I've actually lost data this time. Erasure encoded 6.3 but losing 5 drives I lost a small number of PGs (6). Repair gives this message
$ ceph-bluestore-tool repair --deep true --path /var/lib/ceph/osd/ceph-0
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool, bool)' thread 7f21c3c6d980 time 2019-07-25 23:19:44.820537
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: 848: FAILED assert(r != q->second->file_map.end())
ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7f21c56c4b5e]
2: (()+0x2c4cb7) [0x7f21c56c4cb7]
3: (BlueFS::_replay(bool, bool)+0x4082) [0x56432ef954a2]
4: (BlueFS::mount()+0xff) [0x56432ef958ef]
5: (BlueStore::_open_db(bool, bool)+0x81c) [0x56432eff1a1c]
6: (BlueStore::_fsck(bool, bool)+0x337) [0x56432f00e0a7]
7: (main()+0xf0a) [0x56432eea7dca]
8: (__libc_start_main()+0xeb) [0x7f21c4b7109b]
9: (_start()+0x2a) [0x56432ef700fa]
*** Caught signal (Aborted) **
in thread 7f21c3c6d980 thread_name:ceph-bluestore-
2019-07-25 23:19:44.817 7f21c3c6d980 -1 /build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_replay(bool, bool)' thread 7f21c3c6d980 time 2019-07-25 23:19:44.820537
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: 848: FAILED assert(r != q->second->file_map.end())
I have two osds that don't start but at least make it further into the repair
ceph-bluestore-tool repair --deep true --path /var/lib/ceph/osd/ceph-8
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661, expected 0x9344f85e, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661, expected 0x9344f85e, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661, expected 0x9344f85e, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661, expected 0x9344f85e, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) fsck error: #-1:7b3f43c4:::osd_superblock:0# error during read: 0~21a (5) Input/output error
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661, expected 0x9344f85e, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661, expected 0x9344f85e, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661, expected 0x9344f85e, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661, expected 0x9344f85e, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8) fsck error: #-1:7b3f43c4:::osd_superblock:0# error during read: 0~21a (5) Input/output error
... still running ....
I've read through the archives and unlike others who have come across this I'm not able to recover the content without the lost OSDs.
These PGs are backing a cephfs instance, so ideally
1. I'd be able to recover the 6 missing PGs for 3 OSDs of the 5 in broken state...
or less desirable
2. Figure out how to map PGs to cephfs files that I lost on the cephfs so that I can figure out whats lost and what remains.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com