Hi Dejan, This is a known issue and please see https://tracker.ceph.com/issues/61009. For the workaround please see https://tracker.ceph.com/issues/61009#note-26. Thanks - Xiubo On 5/8/24 06:49, Dejan Lesjak wrote:
Hello, We have cephfs with two active MDS. Currently rank 1 is repeatedly crashing with FAILED ceph_assert(p->first <= start) in md_log_replay thread. Is there any way to work around this and get to accesible file system or should we start with disaster recovery? It seems similar to https://tracker.ceph.com/issues/61009 Crash info: { "assert_condition": "p->first <= start", "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h", "assert_func": "void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]", "assert_line": 568, "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread 7fcdaaf8a640 time 2024-05-08T00:26:22.049974+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start)\n", "assert_thread_name": "md_log_replay", "backtrace": [ "/lib64/libc.so.6(+0x54db0) [0x7fcdb7a54db0]", "/lib64/libc.so.6(+0xa154c) [0x7fcdb7aa154c]", "raise()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7fcdb83610ff]", "/usr/lib64/ceph/libceph-common.so.2(+0x161263) [0x7fcdb8361263]", "/usr/bin/ceph-mds(+0x1f3b0e) [0x55a5904a9b0e]", "/usr/bin/ceph-mds(+0x1f3b55) [0x55a5904a9b55]", "(EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4b9d) [0x55a5906e1c8d]", "(EUpdate::replay(MDSRank*)+0x5d) [0x55a5906eacbd]", "(MDLog::_replay_thread()+0x7a1) [0x55a590694af1]", "/usr/bin/ceph-mds(+0x1460f1) [0x55a5903fc0f1]", "/lib64/libc.so.6(+0x9f802) [0x7fcdb7a9f802]", "/lib64/libc.so.6(+0x3f450) [0x7fcdb7a3f450]" ], "ceph_version": "18.2.2", "crash_id": "2024-05-07T22:26:22.050652Z_8be89ffb-bb87-4832-9339-57f8bd29f766", "entity_name": "mds.spod19", "os_id": "almalinux", "os_name": "AlmaLinux", "os_version": "9.3 (Shamrock Pampas Cat)", "os_version_id": "9.3", "process_name": "ceph-mds", "stack_sig": "3d0a2ca9b3c7678bf69efc20fff42b588c63f8be1832e1e0c28c99bafc082c15", "timestamp": "2024-05-07T22:26:22.050652Z", "utsname_hostname": "spod19.ijs.si", "utsname_machine": "x86_64", "utsname_release": "5.14.0-362.8.1.el9_3.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 2023" } Cheers, Dejan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx