Re: resharding RocksDB after upgrade to Pacific breaks OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

this seems like a dangerous operation to me, I tried the same on two different virtual clusters, Reef and Pacific (all upgraded from previous releases). In Reef the reshard fails alltogether and the OSD fails to start, I had to recreate it. In Pacific the reshard reports a successful operation, but the OSD fails to start as well, with the same stack trace as yours. I wasn't aware of this resharding operation yet, but is it really safe? I don't have an idea how to fix, I just recreated the OSDs.


Zitat von Denis Polom <denispolom@xxxxxxxxx>:

Hi

we upgraded our Ceph cluster from latest Octopus to Pacific 16.2.14 and then we followed the docs (https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#rocksdb-sharding <https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#rocksdb-sharding>) to reshard RocksDB on our OSDs.

Despite resharding reports operation as successful, OSD fails to start.

# ceph-bluestore-tool  --path /var/lib/ceph/osd/ceph-5/ --sharding="m(3) p(3,0-12) o(3,0-13)=block_cache={type=binned_lru} l p" reshard
reshard success

Oct 30 12:44:17 octopus2 ceph-osd[4521]: /build/ceph-16.2.14/src/kv/RocksDBStore.cc: 1223: FAILED ceph_assert(recreate_mode) Oct 30 12:44:17 octopus2 ceph-osd[4521]:  ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable) Oct 30 12:44:17 octopus2 ceph-osd[4521]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x564047cb92b2] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  2: /usr/bin/ceph-osd(+0xaa948a) [0x564047cb948a] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  3: (RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1609) [0x564048794829] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  4: (BlueStore::_open_db(bool, bool, bool)+0x601) [0x564048240421] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  5: (BlueStore::_open_db_and_around(bool, bool)+0x26b) [0x5640482a5f8b] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  6: (BlueStore::_mount()+0x9c) [0x5640482a896c] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  7: (OSD::init()+0x38a) [0x564047daacea]
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  8: main()
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  9: __libc_start_main()
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  10: _start()
Oct 30 12:44:17 octopus2 ceph-osd[4521]:      0> 2023-10-30T12:44:17.088+0000 7f4971ed2100 -1 *** Caught signal (Aborted) ** Oct 30 12:44:17 octopus2 ceph-osd[4521]:  in thread 7f4971ed2100 thread_name:ceph-osd Oct 30 12:44:17 octopus2 ceph-osd[4521]:  ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable) Oct 30 12:44:17 octopus2 ceph-osd[4521]:  1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f4972921730]
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  2: gsignal()
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  3: abort()
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19c) [0x564047cb9303] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  5: /usr/bin/ceph-osd(+0xaa948a) [0x564047cb948a] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  6: (RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1609) [0x564048794829] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  7: (BlueStore::_open_db(bool, bool, bool)+0x601) [0x564048240421] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  8: (BlueStore::_open_db_and_around(bool, bool)+0x26b) [0x5640482a5f8b] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  9: (BlueStore::_mount()+0x9c) [0x5640482a896c] Oct 30 12:44:17 octopus2 ceph-osd[4521]:  10: (OSD::init()+0x38a) [0x564047daacea]
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  11: main()
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  12: __libc_start_main()
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  13: _start()
Oct 30 12:44:17 octopus2 ceph-osd[4521]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Oct 30 12:44:17 octopus2 ceph-osd[4521]:     -1> 2023-10-30T12:44:17.084+0000 7f4971ed2100 -1 /build/ceph-16.2.14/src/kv/RocksDBStore.cc: In function 'int RocksDBStore::do_open(std::ostream&, bool, bool, const string&)' thread 7f4971ed2100 time 2023-10-30T12:44:17.087172+0000

I've submitted bug report here https://tracker.ceph.com/issues/63353 but may be community here have some ideas how to fix it unless it's really a bug.

Thanks
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux