Hi Wissem,
first of all the bug wasn't fixed with the PR you're referring - it just
added additional log output on the problem detection.
Unfortunately the bug isn't fixed yet as the root cause for zombie
spanning blobs appearance is still unclear. The relevant ticket is
https://tracker.ceph.com/issues/48216
There is a workaround though - ceph-bluestore-tool's repair command
would detect zombie spanning blobs and remove them which should
eliminate the assertion for a while.
I'd recommend to run fsck/repair periodically as it looks like your
cluster is exposed to the problem and zombies would rather come back -
it's crucial to keep their amount below 32K per PG to avoid the assertion.
Thanks,
Igor
On 2/17/2022 1:41 PM, Wissem MIMOUNA wrote:
Dear,
Some ODSs on our ceph cluster crush with no explication .
Stop and Start of the crushed OSD daemon fixed the issue but this happend few times and I just need to understand the reason.
For your information the error has been fixed in the log change in the octopus release (https://github.com/ceph/ceph/pull/27911).
Below the logs related to the crash :
"process_name": "ceph-osd",
"entity_name": "osd.x",
"ceph_version": "15.2.15",
"utsname_hostname": "",
"utsname_sysname": "Linux",
"utsname_release": "4.15.0-162-generic",
"utsname_version":
"utsname_machine": "x86_64",
"os_name": "Ubuntu",
"os_id": "ubuntu",
"os_version_id": "18.04",
"os_version": "18.04.6 LTS (Bionic Beaver)",
"assert_condition": "abort",
"assert_func": "bid_t BlueStore::ExtentMap::allocate_spanning_blob_id()",
"assert_file": "/build/ceph-15.2.15/src/os/bluestore/BlueStore.cc",
"assert_line": 2664,
"assert_thread_name": "tp_osd_tp",
"assert_msg": "/build/ceph-15.2.15/src/os/bluestore/BlueStore.cc: In function 'bid_t BlueStore::ExtentMap::allocate_spanning_blob_id()' thread 7f6d37800700 time 2022-02-17T09:41:55.108101+0100\n/build/ceph-15.2.15/src/os/bluestore/BlueStore.cc: 2664: ceph_abort_msg(\"no available blob id\")\n",
"backtrace": [
"(()+0x12980) [0x7f6d59516980]",
"(gsignal()+0xc7) [0x7f6d581c8fb7]",
"(abort()+0x141) [0x7f6d581ca921]",
"(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b2) [0x55ddc61f245f]",
"(BlueStore::ExtentMap::allocate_spanning_blob_id()+0x104) [0x55ddc674b594]",
"(BlueStore::ExtentMap::reshard(KeyValueDB*, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x1408) [0x55ddc674c9c8]",
"(BlueStore::_record_onode(boost::intrusive_ptr<BlueStore::Onode>&, std::shared_ptr<KeyValueDB::TransactionImpl>&)+0x91c) [0x55ddc674f4ec]",
"(BlueStore::_txc_write_nodes(BlueStore::TransContext*, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x7e) [0x55ddc6751b4e]",
"(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2fc) [0x55ddc677892c]",
"(non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0x55ddc63eef44]",
"(ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&)+0x9cd) [0x55ddc65cb95d]",
"(ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x23d) [0x55ddc65e3c2d]",
"(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x55ddc643b157]",
"(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6fd) [0x55ddc63ddddd]",
"(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x17b) [0x55ddc62618bb]",
"(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x67) [0x55ddc64bf167]",
"(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90c) [0x55ddc627ef4c]",
"(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x55ddc68d1d0c]",
"(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55ddc68d4f60]",
"(()+0x76db) [0x7f6d5950b6db]",
"(clone()+0x3f) [0x7f6d582ab71f]"
]
Best Regards
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx