Hi.We ran into an issue with "pg-upmap-primary", which resulted in our monitors crashing massively (around 1,000 crashes per day). According to [1], it should be possible to remove these pg-upmap-primaries. Unfortunately, we are unable to do so because these PGs, and therefore the pool, no longer exist.
root@xxxxxxxxxx ~ # ceph osd dump | grep 'pg_upmap_primary' | grep 24.ffc pg_upmap_primary 24.ffc 232 root@xxxxxxxxxx ~ # ceph osd rm-pg-upmap-primary 24.ffc Error ENOENT: pgid '24.ffc' does not exist root@xxxxxxxxxx ~ # ceph pg dump | grep "^24\." dumped all Is there any way to remove these structures?We also tried upgrading from the current version 18.2.1 to 18.2.4, but this led to a state on our three-node test cluster where one of the three monitors failed to start, along with a third of the OSDs, due to issues with the mentioned structure. Restarting the daemon didn’t help.
Does anyone have a solution or an idea? This is becoming quite a problem for us.
Below, I am attaching one of the many monitor crash logs. Thank you very much for any advice!Of course, we have also created a ticket in the tracker [https://tracker.ceph.com/issues/69760], where the same information I’m sending in this email is documented.
Michal [1] https://tracker.ceph.com/issues/61948#note-32 { "assert_condition": "pg_upmap_primaries.empty()","assert_file": "/builddir/build/BUILD/ceph-18.2.1/src/osd/OSDMap.cc",https://tracker.ceph.com/issues/69760 "assert_func": "void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const",
"assert_line": 3239,"assert_msg": "/builddir/build/BUILD/ceph-18.2.1/src/osd/OSDMap.cc: In function 'void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7f94216a3640 time 2025-02-02T19:16:03.629964+0100\n/builddir/build/BUILD/ceph-18.2.1/src/osd/OSDMap.cc: 3239: FAILED ceph_assert(pg_upmap_primaries.empty())\n",
"assert_thread_name": "ms_dispatch", "backtrace": [ "/lib64/libc.so.6(+0x54db0) [0x7f9429054db0]", "/lib64/libc.so.6(+0xa365c) [0x7f94290a365c]", "raise()", "abort()","(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7f9429d630df]",
"/usr/lib64/ceph/libceph-common.so.2(+0x163243) [0x7f9429d63243]", "/usr/lib64/ceph/libceph-common.so.2(+0x1a0f38) [0x7f9429da0f38]","(OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xe2) [0x55ca54957e22]", "(OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x1de) [0x55ca549596ae]",
"(OSDMonitor::build_latest_full(unsigned long)+0x2a3) [0x55ca549599a3]", "(OSDMonitor::check_osdmap_sub(Subscription*)+0xc8) [0x55ca5495be98]","(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xf04) [0x55ca54834dd4]", "(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x6a6) [0x55ca548359c6]",
"(Monitor::_ms_dispatch(Message*)+0x779) [0x55ca54836d59]", "/usr/bin/ceph-mon(+0x2f3dfe) [0x55ca547f1dfe]", "(DispatchQueue::entry()+0x52a) [0x7f9429f5766a]", "/usr/lib64/ceph/libceph-common.so.2(+0x3e7321) [0x7f9429fe7321]", "/lib64/libc.so.6(+0xa1912) [0x7f94290a1912]", "/lib64/libc.so.6(+0x3f450) [0x7f942903f450]" ], "ceph_version": "18.2.1","crash_id": "2025-02-02T18:16:03.632571Z_f5516ed0-6df5-4267-bada-71f5d8d764ba",
"entity_name": "mon.mon001-clX", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "9", "os_version_id": "9", "process_name": "ceph-mon","stack_sig": "772ef523b041edc5147d1d9905926fb794d32b2635368a8199f6e2e4f2d688bf",
"timestamp": "2025-02-02T18:16:03.632571Z", "utsname_hostname": "app001.clX", "utsname_machine": "x86_64", "utsname_release": "5.14.0-402.el9.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP PREEMPT_DYNAMIC Thu Dec 21 19:46:35 UTC 2023" }
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx