Multiple cephfs MDS crashes with same assert_condition: state == LOCK_XLOCK || state == LOCK_XLOCKDONE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

Today we suddenly experience multiple MDS crashes during the day with an error we have not seen earlier. We run octopus 15.2.13 with 4 ranks and 4 standby-reply MDSes and 1 passive standby. Any input on how to troubleshot or resolve this would be most welcome.

---

root@hk-cephnode-54:~# ceph crash ls
2021-08-09T08:06:41.573899Z_306a9a10-b9d7-4a68-83a9-f5bd3d700fd7  mds.hk-cephnode-58       
2021-08-09T08:09:03.132838Z_9a62b1fc-6069-4576-974d-2e0464169bb5  mds.hk-cephnode-62       
2021-08-09T11:20:23.776776Z_5a665d00-9862-4d8f-99b5-323cdf441966  mds.hk-cephnode-54       
2021-08-09T11:25:14.213601Z_f47fa398-5582-4da6-8e18-9252bbb52805  mds.hk-cephnode-62       
2021-08-09T12:44:34.190128Z_1e163bf2-6ddf-45ef-a80f-0bf42158da31  mds.hk-cephnode-60       

---

*All the crashlogs have the same assert_condition/file/msg*

root@hk-cephnode-54:~# ceph crash info 2021-08-09T12:44:34.190128Z_1e163bf2-6ddf-45ef-a80f-0bf42158da31
{
    "archived": "2021-08-09 12:53:01.429088",
    "assert_condition": "state == LOCK_XLOCK || state == LOCK_XLOCKDONE",
    "assert_file": "/build/ceph/ceph-15.2.13/src/mds/ScatterLock.h",
    "assert_func": "void ScatterLock::set_xlock_snap_sync(MDSContext*)",
    "assert_line": 59,
    "assert_msg": "/build/ceph/ceph-15.2.13/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f0f76853700 time 2021-08-09T14:44:34.185861+0200\n/build/ceph/ceph-15.2.13/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state == LOCK_XLOCK || state == LOCK_XLOCKDONE)\n",
    "assert_thread_name": "MR_Finisher",
    "backtrace": [
        "(()+0x12730) [0x7f0f8153d730]",
        "(gsignal()+0x10b) [0x7f0f80e027bb]",
        "(abort()+0x121) [0x7f0f80ded535]",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a5) [0x7f0f81f1d0f5]",
        "(()+0x28127c) [0x7f0f81f1d27c]",
        "(MDCache::truncate_inode(CInode*, LogSegment*)+0x305) [0x55ed3b243aa5]",
        "(C_MDS_inode_update_finish::finish(int)+0x14c) [0x55ed3b219dec]",
        "(MDSContext::complete(int)+0x52) [0x55ed3b4156d2]",
        "(MDSIOContextBase::complete(int)+0x9f) [0x55ed3b4158af]",
        "(MDSLogContextBase::complete(int)+0x40) [0x55ed3b415c30]",
        "(Finisher::finisher_thread_entry()+0x19d) [0x7f0f81fab73d]",
        "(()+0x7fa3) [0x7f0f81532fa3]",
        "(clone()+0x3f) [0x7f0f80ec44cf]"
    ],
    "ceph_version": "15.2.13",
    "crash_id": "2021-08-09T12:44:34.190128Z_1e163bf2-6ddf-45ef-a80f-0bf42158da31",
    "entity_name": "mds.hk-cephnode-60",
    "os_id": "10",
    "os_name": "Debian GNU/Linux 10 (buster)",
    "os_version": "10 (buster)",
    "os_version_id": "10",
    "process_name": "ceph-mds",
    "stack_sig": "5f310d14ffe4b2600195c874fba3761c268218711ee4a449413862bb5553fb4c",
    "timestamp": "2021-08-09T12:44:34.190128Z",
    "utsname_hostname": "hk-cephnode-60",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.114-1-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PVE 5.4.114-1 (Sun, 09 May 2021 17:13:05 +0200)»
}


--- 

root@hk-cephnode-54:~# ceph health detail
HEALTH_WARN 1 daemons have recently crashed
[WRN] RECENT_CRASH: 1 daemons have recently crashed
    mds.hk-cephnode-54 crashed on host hk-cephnode-54 at 2021-08-09T11:20:23.776776Z

root@hk-cephnode-54:~# ceph status
  cluster:
    id:     xxxx
    health: HEALTH_WARN
            1 daemons have recently crashed

  services:
    mon: 3 daemons, quorum hk-cephnode-60,hk-cephnode-61,hk-cephnode-62 (age 4w)
    mgr: hk-cephnode-53(active, since 4h), standbys: hk-cephnode-51, hk-cephnode-52
    mds: cephfs:4 {0=hk-cephnode-60=up:active,1=hk-cephnode-61=up:active,2=hk-cephnode-55=up:active,3=hk-cephnode-57=up:active} 4 up:standby-replay 1 up:standby
    osd: 180 osds: 180 up (since 5d), 180 in (since 2w)
 
  data:
    pools:   9 pools, 2433 pgs
    objects: 118.22M objects, 331 TiB
    usage:   935 TiB used, 990 TiB / 1.9 PiB avail
    pgs:     2433 active+clean
 
  io:
    client:   231 MiB/s rd, 146 MiB/s wr, 900 op/s rd, 4.07k op/s wr


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux