Hi everyone, I've been running rbd-mirror between my old Ceph system (16.2.10) and my new system (18.2.2). I'm using journaling mode on a pool that contains 7,500 images. Everything was running perfectly until it processed about 5,608 images. Now, it keeps crashing with the following message: 2024-07-19T05:49:32.425+0000 7f582b3fd6c0 0 set uid:gid to 167:167 (ceph:ceph) 2024-07-19T05:49:32.425+0000 7f582b3fd6c0 0 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable), process rbd-mirror, pid 7 2024-07-19T05:49:32.429+0000 7f582b3fd6c0 1 mgrc service_daemon_register rbd-mirror.3606956688 metadata {arch=x86_64, ceph_release=pacific, ceph_version=ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable), ceph_version_short=16.2.10, container_hostname=mon-001, container_image=quay.io/ceph/ceph@sha256:2b68483bcd050472a18e73389c0e1f3f70d34bb7abf733f692e88c935ea0a6bd, cpu=Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz, distro=centos, distro_description=CentOS Stream 8, distro_version=8, hostname=mon-001, id=mon-001.lcqrti, instance_id=3606956688, kernel_description=#1 SMP Mon Jul 18 17:42:52 UTC 2022, kernel_version=4.18.0-408.el8.x86_64, mem_swap_kb=4194300, mem_total_kb=131393360, os=Linux} 2024-07-19T05:50:28.305+0000 7f5812582700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7f5812582700 time 2024-07-19T05:50:28.303536+0000 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/common/Thread.cc: 165: FAILED ceph_assert(ret == 0) ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f58218b6de8] 2: /usr/lib64/ceph/libceph-common.so.2(+0x277002) [0x7f58218b7002] 3: /usr/lib64/ceph/libceph-common.so.2(+0x362fd7) [0x7f58219a2fd7] 4: (CommonSafeTimer<std::mutex>::init()+0x1fe) [0x7f58219a963e] 5: (journal::Journaler::Threads::Threads(ceph::common::CephContext*)+0x2fc) [0x55c9b33c6ddc] 6: (journal::Journaler::Journaler(librados::v14_2_0::IoCtx&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, journal::Settings const&, journal::CacheManagerHandler*)+0x50) [0x55c9b33c6f10] 7: (librbd::Journal<librbd::ImageCtx>::get_tag_owner(librados::v14_2_0::IoCtx&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, librbd::asio::ContextWQ*, Context*)+0x19f) [0x55c9b2fa65af] 8: (librbd::mirror::GetInfoRequest<librbd::ImageCtx>::get_journal_tag_owner()+0x210) [0x55c9b31869f0] 9: (librbd::mirror::GetInfoRequest<librbd::ImageCtx>::handle_get_mirror_image(int)+0x8c8) [0x55c9b3189d78] 10: /lib64/librados.so.2(+0xa8546) [0x7f582aedb546] 11: /lib64/librados.so.2(+0xc17e5) [0x7f582aef47e5] 12: /lib64/librados.so.2(+0xc3742) [0x7f582aef6742] 13: /lib64/librados.so.2(+0xc914a) [0x7f582aefc14a] 14: /lib64/libstdc++.so.6(+0xc2ba3) [0x7f581fb03ba3] 15: /lib64/libpthread.so.0(+0x81ca) [0x7f5820cec1ca] 16: clone() Has anyone encountered a similar problem or have any insight into what might be causing this crash? Thanks in advance for your help. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx