Hi, We have seen several issues (mailed about that earlier to this list) after the upgrade to Mimic 13.2.8. We decided to downgrade the OSD servewrs to 13.2.6 to check if issues disappear. However we ran into issues with that ... We use bluestore allocator since Luminous 12.2.12 to combat latency issues on the OSDs. We also used that succesfully on Mimic 13.2.6. bluestore_allocator = bitmap bluefs_allocator = bitmap When downgrading to 13.2.6 we hit the following assert: 2019-12-27 14:14:16.409 7f2ed2dcce00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block size 3.5 TiB 2019-12-27 14:14:16.409 7f2ed2dcce00 1 bluefs mount 2019-12-27 14:14:16.413 7f2ed2dcce00 -1 /build/ceph-13.2.6/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f2ed2dcce00 time 2019-12-27 14:14:16.414793 /build/ceph-13.2.6/src/os/bluestore/fastbmap_allocator_impl.h: 749: FAILED assert(available >= allocated) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7f2eca0a497e] 2: (()+0x2fab07) [0x7f2eca0a4b07] 3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x44d) [0xc91dbd] 4: (BlueFS::mount()+0x260) [0xc6e6c0] 5: (BlueStore::_open_db(bool, bool)+0x17cd) [0xb8f50d] 6: (BlueStore::_mount(bool, bool)+0x4b7) [0xbbfb77] 7: (OSD::init()+0x295) [0x761fc5] 8: (main()+0x367b) [0x64f23b] 9: (__libc_start_main()+0xf0) [0x7f2ec7c3f830] 10: (_start()+0x29) [0x718929] We upgraded the node back to 13.2.8 again which started without issues. We did do a "downgrade test" on a test cluster ... that cluster did not suffer from this issue. It turned out that the cluster was not using the bitmap allocator ... after enabling the bitmap allocator there on a 13.2.6 node (that has been previviously downgraded but had never run with the bitmap allocator) and restarting the node this came online just fine. However, an upgrade to 13.2.8 with bitmap allocator enabled, and a downgrade again to 13.2.6 would trigger the same assert. Switching back to default (stupid allocator) again would work (initially) for 2 out of 3 OSDs. One would fail right away with rocksdb corruption: 2019-12-27 15:10:50.945 7fc77fbcbe00 20 osd.6 1952 register_pg 2.16 0x990c800 2019-12-27 15:10:50.945 7fc77fbcbe00 10 osd.6:2._attach_pg 2.16 0x990c800 2019-12-27 15:10:50.945 7fc77fbcbe00 10 osd.6 1952 pgid 2.0 coll 2.0_head 2019-12-27 15:10:50.945 7fc77fbcbe00 10 osd.6 1952 _make_pg 2.0 2019-12-27 15:10:50.945 7fc77fbcbe00 5 osd.6 pg_epoch: 1952 pg[2.0(unlocked)] enter Initial 2019-12-27 15:10:50.945 7fc77fbcbe00 20 osd.6 pg_epoch: 1952 pg[2.0(unlocked)] enter NotTrimming 2019-12-27 15:10:50.945 7fc77fbcbe00 -1 abort: Corruption: block checksum mismatch: expected 1122551773, got 2333355710 in db/000397.sst offset 57741 size 4044 2019-12-27 15:10:50.949 7fc77fbcbe00 -1 *** Caught signal (Aborted) ** in thread 7fc77fbcbe00 thread_name:ceph-osd ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (()+0x11390) [0x7fc775520390] 2: (gsignal()+0x38) [0x7fc774a53428] 3: (abort()+0x16a) [0x7fc774a5502a] 4: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::list*)+0x4a8) [0xbff498] 5: (BlueStore::omap_get_values(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > >*)+0x201) [0xb852e1] 6: (PG::read_info(ObjectStore*, spg_t, coll_t const&, pg_info_t&, PastIntervals&, unsigned char&)+0x16b) [0x7ecc8b] 7: (PG::read_state(ObjectStore*)+0x56) [0x81aff6] 8: (OSD::load_pgs()+0x566) [0x759516] 9: (OSD::init()+0xcd3) [0x762a03] 10: (main()+0x367b) [0x64f23b] 11: (__libc_start_main()+0xf0) [0x7fc774a3e830] 12: (_start()+0x29) [0x718929] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. And after a restart we got rocksdb messages like this: ... 2019-12-27 15:11:32.322 7fd1c0fbbe00 4 rocksdb: [/build/ceph-13.2.6/src/rocksdb/db/version_set.cc:3088] Recovering from manifest file: MANIFEST-000402 ... ... -352> 2019-12-27 15:11:11.598 7fa0f0558e00 -1 abort: Corruption: Bad table magic number: expected 9863518390377041911, found 11124 in db/000397.sst ... After we set osd.6 out ... osd.7 crashed after a while (while backfilling) and would fail to restart again with the following message: 2019-12-27 15:27:10.833 7f1bb4701e00 4 rocksdb: [/build/ceph-13.2.6/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all background work 2019-12-27 15:27:10.833 7f1bb4701e00 4 rocksdb: [/build/ceph-13.2.6/src/rocksdb/db/db_impl.cc:397] Shutdown complete 2019-12-27 15:27:10.833 7f1bb4701e00 -1 rocksdb: Corruption: CURRENT file does not end with newline 2019-12-27 15:27:10.833 7f1bb4701e00 -1 bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db: 2019-12-27 15:27:10.833 7f1bb4701e00 1 bluefs umount 2019-12-27 15:27:10.833 7f1bb4701e00 1 stupidalloc 0x0x325aee0 shutdown 2019-12-27 15:27:10.833 7f1bb4701e00 1 bdev(0x380a380 /var/lib/ceph/osd/ceph-7/block) close 2019-12-27 15:27:11.093 7f1bb4701e00 1 bdev(0x380a000 /var/lib/ceph/osd/ceph-7/block) close 2019-12-27 15:27:11.345 7f1bb4701e00 -1 osd.7 0 OSD:init: unable to mount object store 2019-12-27 15:27:11.345 7f1bb4701e00 -1 ESC[0;31m ** ERROR: osd init failed: (5) Input/output errorESC[0m A restart / reboot of the node would not help. For those of you still running 13.2.6 ... I would not recommend upgrading to 13.2.8 (at least not for storage nodes ... mon / mds still seem to work fine). Does bitmap allocator modify the OSD on disk data in some way? Are you supposed to be able to switch between different allocators? Thanks, Stefan -- | BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx