Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 5/8/24 17:36, Dejan Lesjak wrote:
Hi Xiubo,

On 8. 05. 24 09:53, Xiubo Li wrote:
Hi Dejan,

This is a known issue and please see https://tracker.ceph.com/issues/61009.

For the workaround please see https://tracker.ceph.com/issues/61009#note-26.

Thank you for the links. Unfortunately I'm not sure I understand the workaround: the clients should be mounted without nowsync, however, the clients don't get to the point of mounting as mds is not available yet as it is doing replay. Rebooting clients does not seem to help as they are still in clients list (from "ceph tell mds.1 client ls").

Hi Dejan,

We are disscussing the same issue in slack thread https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1715189877518529.

Thanks

- Xiubo


Thanks,
Dejan

Thanks

- Xiubo

On 5/8/24 06:49, Dejan Lesjak wrote:
Hello,

We have cephfs with two active MDS. Currently rank 1 is repeatedly crashing with FAILED ceph_assert(p->first <= start) in md_log_replay thread. Is there any way to work around this and get to accesible file system or should we start with disaster recovery?
It seems similar to https://tracker.ceph.com/issues/61009
Crash info:

{
     "assert_condition": "p->first <= start",
     "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h",      "assert_func": "void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]",
     "assert_line": 568,
     "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' thread 7fcdaaf8a640 time 2024-05-08T00:26:22.049974+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start)\n",
     "assert_thread_name": "md_log_replay",
     "backtrace": [
         "/lib64/libc.so.6(+0x54db0) [0x7fcdb7a54db0]",
         "/lib64/libc.so.6(+0xa154c) [0x7fcdb7aa154c]",
         "raise()",
         "abort()",
         "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7fcdb83610ff]",          "/usr/lib64/ceph/libceph-common.so.2(+0x161263) [0x7fcdb8361263]",
         "/usr/bin/ceph-mds(+0x1f3b0e) [0x55a5904a9b0e]",
         "/usr/bin/ceph-mds(+0x1f3b55) [0x55a5904a9b55]",
         "(EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4b9d) [0x55a5906e1c8d]",
         "(EUpdate::replay(MDSRank*)+0x5d) [0x55a5906eacbd]",
         "(MDLog::_replay_thread()+0x7a1) [0x55a590694af1]",
         "/usr/bin/ceph-mds(+0x1460f1) [0x55a5903fc0f1]",
         "/lib64/libc.so.6(+0x9f802) [0x7fcdb7a9f802]",
         "/lib64/libc.so.6(+0x3f450) [0x7fcdb7a3f450]"
     ],
     "ceph_version": "18.2.2",
     "crash_id": "2024-05-07T22:26:22.050652Z_8be89ffb-bb87-4832-9339-57f8bd29f766",
     "entity_name": "mds.spod19",
     "os_id": "almalinux",
     "os_name": "AlmaLinux",
     "os_version": "9.3 (Shamrock Pampas Cat)",
     "os_version_id": "9.3",
     "process_name": "ceph-mds",
     "stack_sig": "3d0a2ca9b3c7678bf69efc20fff42b588c63f8be1832e1e0c28c99bafc082c15",
     "timestamp": "2024-05-07T22:26:22.050652Z",
     "utsname_hostname": "spod19.ijs.si",
     "utsname_machine": "x86_64",
     "utsname_release": "5.14.0-362.8.1.el9_3.x86_64",
     "utsname_sysname": "Linux",
     "utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 2023"
}


Cheers,
Dejan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux