MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

We have cephfs with two active MDS. Currently rank 1 is repeatedly crashing with FAILED ceph_assert(p->first <= start) in md_log_replay thread. Is there any way to work around this and get to accesible file system or should we start with disaster recovery?
It seems similar to https://tracker.ceph.com/issues/61009
Crash info:

{
    "assert_condition": "p->first <= start",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h",
    "assert_func": "void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]",
    "assert_line": 568,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread 7fcdaaf8a640 time 2024-05-08T00:26:22.049974+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start)\n",
    "assert_thread_name": "md_log_replay",
    "backtrace": [
        "/lib64/libc.so.6(+0x54db0) [0x7fcdb7a54db0]",
        "/lib64/libc.so.6(+0xa154c) [0x7fcdb7aa154c]",
        "raise()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7fcdb83610ff]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x161263) [0x7fcdb8361263]",
        "/usr/bin/ceph-mds(+0x1f3b0e) [0x55a5904a9b0e]",
        "/usr/bin/ceph-mds(+0x1f3b55) [0x55a5904a9b55]",
        "(EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4b9d) [0x55a5906e1c8d]",
        "(EUpdate::replay(MDSRank*)+0x5d) [0x55a5906eacbd]",
        "(MDLog::_replay_thread()+0x7a1) [0x55a590694af1]",
        "/usr/bin/ceph-mds(+0x1460f1) [0x55a5903fc0f1]",
        "/lib64/libc.so.6(+0x9f802) [0x7fcdb7a9f802]",
        "/lib64/libc.so.6(+0x3f450) [0x7fcdb7a3f450]"
    ],
    "ceph_version": "18.2.2",
    "crash_id": "2024-05-07T22:26:22.050652Z_8be89ffb-bb87-4832-9339-57f8bd29f766",
    "entity_name": "mds.spod19",
    "os_id": "almalinux",
    "os_name": "AlmaLinux",
    "os_version": "9.3 (Shamrock Pampas Cat)",
    "os_version_id": "9.3",
    "process_name": "ceph-mds",
    "stack_sig": "3d0a2ca9b3c7678bf69efc20fff42b588c63f8be1832e1e0c28c99bafc082c15",
    "timestamp": "2024-05-07T22:26:22.050652Z",
    "utsname_hostname": "spod19.ijs.si",
    "utsname_machine": "x86_64",
    "utsname_release": "5.14.0-362.8.1.el9_3.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 2023"
}


Cheers,
Dejan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux