On Thu, Dec 15, 2022 at 3:17 PM Mevludin Blazevic <mblazevic@xxxxxxxxxxxxxx> wrote: > > Ceph fs dump: > > e62 > enable_multiple, ever_enabled_multiple: 1,1 > default compat: compat={},rocompat={},incompat={1=base v0.20,2=client > writeable ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no > anchor table,9=file layout v2,10=snaprealm v2}legacy client fscid: 1 > > Filesystem 'ceph_fs' (1) > fs_name ceph_fs > epoch 62 > flags 12 > created 2022-11-28T12:05:17.203346+0000 > modified 2022-12-15T12:09:14.091724+0000 > tableserver 0 > root 0 > session_timeout 60 > session_autoclose 300 > max_file_size 1099511627776 > required_client_features {} > last_failure 0 > last_failure_osd_epoch 196035 > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no > anchor table,9=file layout v2,10=snaprealm v2} > max_mds 1 > in 0 > up {} > failed 0 > damaged > stopped > data_pools [4] > metadata_pool 5 > inline_data disabled > balancer > standby_count_wanted 1 > > f > Standby daemons: > > [mds.ceph_fs.store5.gnlqqm{-1:152180029} state up:standby seq 1 > join_fscid=1 addr > [v2:192.168.50.135:6800/3548272808,v1:192.168.50.135:6801/3548272808] > compat {c=[1],r=[1],i=[1]}] > [mds.ceph_fs.store6.fxgvoj{-1:152416137} state up:standby seq 1 > join_fscid=1 addr > [v2:192.168.50.136:7024/1339959968,v1:192.168.50.136:7025/1339959968] > compat {c=[1],r=[1],i=[1]}] > [mds.ceph_fs.store4.mhvpot{-1:152477853} state up:standby seq 1 > join_fscid=1 addr > [v2:192.168.50.134:6800/3098669884,v1:192.168.50.134:6801/3098669884] > compat {c=[1],r=[1],i=[1]}] > [mds.ceph_fs.store3.vcnwzh{-1:152481783} state up:standby seq 1 > join_fscid=1 addr > [v2:192.168.50.133:6800/77378788,v1:192.168.50.133:6801/77378788] compat > {c=[1],r=[1],i=[1]}] > dumped fsmap epoch 62 > > Ceph Status: > > cluster: > id: 8c774934-1535-11ec-973e-525400130e4f > health: HEALTH_ERR > 1 filesystem is degraded > 1 filesystem has a failed mds daemon > 1 filesystem is offline > 26 daemons have recently crashed > > services: > mon: 2 daemons, quorum cephadm-vm,store2 (age 2d) > mgr: store1.uevcpd(active, since 2d), standbys: cephadm-vm.zwagng > mds: 0/1 daemons up (1 failed), 4 standby > osd: 312 osds: 312 up (since 8h), 312 in (since 17h) > > data: > volumes: 0/1 healthy, 1 failed > pools: 7 pools, 289 pgs > objects: 2.62M objects, 9.8 TiB > usage: 29 TiB used, 1.9 PiB / 1.9 PiB avail > pgs: 286 active+clean > 3 active+clean+scrubbing+deep > > io: > client: 945 KiB/s rd, 3.3 MiB/s wr, 516 op/s rd, 562 op/s wr > > Ceph Health detail: > > HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds > daemon; 1 filesystem is offline; 26 daemons have recently crashed > [WRN] FS_DEGRADED: 1 filesystem is degraded > fs ceph_fs is degraded > [WRN] FS_WITH_FAILED_MDS: 1 filesystem has a failed mds daemon > fs ceph_fs has 1 failed mds > [ERR] MDS_ALL_DOWN: 1 filesystem is offline > fs ceph_fs is offline because no MDS is active for it. > [WRN] RECENT_CRASH: 26 daemons have recently crashed > osd.323 crashed on host store7 at 2022-12-12T14:03:23.857874Z > osd.323 crashed on host store7 at 2022-12-12T14:03:43.945625Z > osd.323 crashed on host store7 at 2022-12-12T14:04:03.282797Z > osd.323 crashed on host store7 at 2022-12-12T14:04:22.612037Z > osd.323 crashed on host store7 at 2022-12-12T14:04:41.630473Z > osd.323 crashed on host store7 at 2022-12-12T14:34:49.237008Z > osd.323 crashed on host store7 at 2022-12-12T14:35:09.903922Z > osd.323 crashed on host store7 at 2022-12-12T14:35:28.621955Z > osd.323 crashed on host store7 at 2022-12-12T14:35:46.985517Z > osd.323 crashed on host store7 at 2022-12-12T14:36:05.375758Z > osd.323 crashed on host store7 at 2022-12-12T15:01:57.235785Z > osd.323 crashed on host store7 at 2022-12-12T15:02:16.581335Z > osd.323 crashed on host store7 at 2022-12-12T15:02:33.212653Z > osd.323 crashed on host store7 at 2022-12-12T15:02:49.775560Z > osd.323 crashed on host store7 at 2022-12-12T15:03:06.303861Z > mgr.cephadm-vm.zwagng crashed on host cephadm-vm at > 2022-12-13T13:21:41.149773Z > mgr.cephadm-vm.zwagng crashed on host cephadm-vm at > 2022-12-13T13:22:15.413105Z > mgr.cephadm-vm.zwagng crashed on host cephadm-vm at > 2022-12-13T13:23:39.888401Z > mgr.cephadm-vm.zwagng crashed on host cephadm-vm at > 2022-12-13T13:27:56.458529Z > mgr.cephadm-vm.zwagng crashed on host cephadm-vm at > 2022-12-13T13:31:03.791532Z > mgr.cephadm-vm.zwagng crashed on host cephadm-vm at > 2022-12-13T13:34:24.023106Z > osd.98 crashed on host store3 at 2022-12-13T16:11:38.064735Z > mgr.store1.uevcpd crashed on host store1 at 2022-12-13T18:39:33.091261Z > osd.322 crashed on host store6 at 2022-12-14T06:06:14.193437Z > osd.234 crashed on host store8 at 2022-12-15T02:32:13.009795Z > osd.311 crashed on host store8 at 2022-12-15T02:32:18.407978Z > > As suggested I was going to upgrade the ceph cluster to 16.2.7 to fix > the mds issue, but it seems none of the running standby daemons is > responding. Suggest also looking at the cephadm logs which may explain how it's stuck: https://docs.ceph.com/en/quincy/cephadm/operations/#watching-cephadm-log-messages Except that your MDS daemons have not been upgraded, I don't see a problem from the CephFS side. You can try removing the daemons, it probably can't make things worse :) -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx