Re: Monitors crash largely due to the structure of pg-upmap-primary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Re-create a pool 24 with the necessary number of PGs, remove the upmaps, then delete the temporary pool?

Mind you I would think that deleting a pool should itself remove upmaps for comprised PGs.

> On Feb 24, 2025, at 9:12 AM, Michal Strnad <michal.strnad@xxxxxxxxx> wrote:
> 
> Hi.
> 
> We ran into an issue with "pg-upmap-primary", which resulted in our monitors crashing massively (around 1,000 crashes per day). According to [1], it should be possible to remove these pg-upmap-primaries. Unfortunately, we are unable to do so because these PGs, and therefore the pool, no longer exist.
> 
> root@xxxxxxxxxx ~ # ceph osd dump | grep 'pg_upmap_primary' | grep 24.ffc
> pg_upmap_primary 24.ffc 232
> root@xxxxxxxxxx ~ # ceph osd rm-pg-upmap-primary 24.ffc
> Error ENOENT: pgid '24.ffc' does not exist
> root@xxxxxxxxxx ~ # ceph pg dump | grep "^24\."
> dumped all
> 
> Is there any way to remove these structures?
> 
> We also tried upgrading from the current version 18.2.1 to 18.2.4, but this led to a state on our three-node test cluster where one of the three monitors failed to start, along with a third of the OSDs, due to issues with the mentioned structure. Restarting the daemon didn’t help.
> 
> Does anyone have a solution or an idea? This is becoming quite a problem for us.
> 
> Below, I am attaching one of the many monitor crash logs.
> 
> Thank you very much for any advice!
> 
> Of course, we have also created a ticket in the tracker [https://tracker.ceph.com/issues/69760], where the same information I’m sending in this email is documented.
> 
> Michal
> 
> [1] https://tracker.ceph.com/issues/61948#note-32
> 
> {
> "assert_condition": "pg_upmap_primaries.empty()",
> "assert_file": "/builddir/build/BUILD/ceph-18.2.1/src/osd/OSDMap.cc",https://tracker.ceph.com/issues/69760
> "assert_func": "void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const",
> "assert_line": 3239,
> "assert_msg": "/builddir/build/BUILD/ceph-18.2.1/src/osd/OSDMap.cc: In function 'void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7f94216a3640 time 2025-02-02T19:16:03.629964+0100\n/builddir/build/BUILD/ceph-18.2.1/src/osd/OSDMap.cc: 3239: FAILED ceph_assert(pg_upmap_primaries.empty())\n",
> "assert_thread_name": "ms_dispatch",
> "backtrace": [
> "/lib64/libc.so.6(+0x54db0) [0x7f9429054db0]",
> "/lib64/libc.so.6(+0xa365c) [0x7f94290a365c]",
> "raise()",
> "abort()",
> "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7f9429d630df]",
> "/usr/lib64/ceph/libceph-common.so.2(+0x163243) [0x7f9429d63243]",
> "/usr/lib64/ceph/libceph-common.so.2(+0x1a0f38) [0x7f9429da0f38]",
> "(OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xe2) [0x55ca54957e22]",
> "(OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x1de) [0x55ca549596ae]",
> "(OSDMonitor::build_latest_full(unsigned long)+0x2a3) [0x55ca549599a3]",
> "(OSDMonitor::check_osdmap_sub(Subscription*)+0xc8) [0x55ca5495be98]",
> "(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xf04) [0x55ca54834dd4]",
> "(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x6a6) [0x55ca548359c6]",
> "(Monitor::_ms_dispatch(Message*)+0x779) [0x55ca54836d59]",
> "/usr/bin/ceph-mon(+0x2f3dfe) [0x55ca547f1dfe]",
> "(DispatchQueue::entry()+0x52a) [0x7f9429f5766a]",
> "/usr/lib64/ceph/libceph-common.so.2(+0x3e7321) [0x7f9429fe7321]",
> "/lib64/libc.so.6(+0xa1912) [0x7f94290a1912]",
> "/lib64/libc.so.6(+0x3f450) [0x7f942903f450]"
> ],
> "ceph_version": "18.2.1",
> "crash_id": "2025-02-02T18:16:03.632571Z_f5516ed0-6df5-4267-bada-71f5d8d764ba",
> "entity_name": "mon.mon001-clX",
> "os_id": "centos",
> "os_name": "CentOS Stream",
> "os_version": "9",
> "os_version_id": "9",
> "process_name": "ceph-mon",
> "stack_sig": "772ef523b041edc5147d1d9905926fb794d32b2635368a8199f6e2e4f2d688bf",
> "timestamp": "2025-02-02T18:16:03.632571Z",
> "utsname_hostname": "app001.clX",
> "utsname_machine": "x86_64",
> "utsname_release": "5.14.0-402.el9.x86_64",
> "utsname_sysname": "Linux",
> "utsname_version": "#1 SMP PREEMPT_DYNAMIC Thu Dec 21 19:46:35 UTC 2023"
> }
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux