Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 8 May 2024 15:53:12 +0800

Hi Dejan,

This is a known issue and please see https://tracker.ceph.com/issues/61009.

For the workaround please see https://tracker.ceph.com/issues/61009#note-26.

Thanks

- Xiubo

On 5/8/24 06:49, Dejan Lesjak wrote:
Hello,

We have cephfs with two active MDS. Currently rank 1 is repeatedly crashing with FAILED ceph_assert(p->first <= start) in md_log_replay thread. Is there any way to work around this and get to accesible file system or should we start with disaster recovery?
It seems similar to https://tracker.ceph.com/issues/61009
Crash info:

{
     "assert_condition": "p->first <= start",
     "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h",
     "assert_func": "void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]",
     "assert_line": 568,
     "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread 7fcdaaf8a640 time 2024-05-08T00:26:22.049974+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start)\n",
     "assert_thread_name": "md_log_replay",
     "backtrace": [
         "/lib64/libc.so.6(+0x54db0) [0x7fcdb7a54db0]",
         "/lib64/libc.so.6(+0xa154c) [0x7fcdb7aa154c]",
         "raise()",
         "abort()",
         "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7fcdb83610ff]",
         "/usr/lib64/ceph/libceph-common.so.2(+0x161263) [0x7fcdb8361263]",
         "/usr/bin/ceph-mds(+0x1f3b0e) [0x55a5904a9b0e]",
         "/usr/bin/ceph-mds(+0x1f3b55) [0x55a5904a9b55]",
         "(EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4b9d) [0x55a5906e1c8d]",
         "(EUpdate::replay(MDSRank*)+0x5d) [0x55a5906eacbd]",
         "(MDLog::_replay_thread()+0x7a1) [0x55a590694af1]",
         "/usr/bin/ceph-mds(+0x1460f1) [0x55a5903fc0f1]",
         "/lib64/libc.so.6(+0x9f802) [0x7fcdb7a9f802]",
         "/lib64/libc.so.6(+0x3f450) [0x7fcdb7a3f450]"
     ],
     "ceph_version": "18.2.2",
     "crash_id": "2024-05-07T22:26:22.050652Z_8be89ffb-bb87-4832-9339-57f8bd29f766",
     "entity_name": "mds.spod19",
     "os_id": "almalinux",
     "os_name": "AlmaLinux",
     "os_version": "9.3 (Shamrock Pampas Cat)",
     "os_version_id": "9.3",
     "process_name": "ceph-mds",
     "stack_sig": "3d0a2ca9b3c7678bf69efc20fff42b588c63f8be1832e1e0c28c99bafc082c15",
     "timestamp": "2024-05-07T22:26:22.050652Z",
     "utsname_hostname": "spod19.ijs.si",
     "utsname_machine": "x86_64",
     "utsname_release": "5.14.0-362.8.1.el9_3.x86_64",
     "utsname_sysname": "Linux",
     "utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 2023"
}

Cheers,
Dejan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx