Hi, We have upgraded one ceph cluster from 17.2.7 to 18.2.0. Since then we are having CephFS issues. For example this morning: “”” [root@naret-monitor01 ~]# ceph -s cluster: id: 63334166-d991-11eb-99de-40a6b72108d0 health: HEALTH_WARN 1 filesystem is degraded 3 clients failing to advance oldest client/flush tid 3 MDSs report slow requests 6 pgs not scrubbed in time 29 daemons have recently crashed … “”” The ceph orch, ceph crash and ceph fs status commands were hanging. After a “ceph mgr fail” those commands started to respond. Then I have noticed that there was one mds with most of the slow operations, “”” [WRN] MDS_SLOW_REQUEST: 3 MDSs report slow requests mds.cephfs.naret-monitor01.nuakzo(mds.0): 18 slow requests are blocked > 30 secs mds.cephfs.naret-monitor01.uvevbf(mds.1): 1683 slow requests are blocked > 30 secs mds.cephfs.naret-monitor02.exceuo(mds.2): 1 slow requests are blocked > 30 secs “”” Then I tried to restart it with “”” [root@naret-monitor01 ~]# ceph orch daemon restart mds.cephfs.naret-monitor01.uvevbf Scheduled to restart mds.cephfs.naret-monitor01.uvevbf on host 'naret-monitor01' “”” After the cephfs entered into this situation: “”” [root@naret-monitor01 ~]# ceph fs status cephfs - 198 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.naret-monitor01.nuakzo Reqs: 0 /s 17.2k 16.2k 1892 14.3k 1 active cephfs.naret-monitor02.ztdghf Reqs: 0 /s 28.1k 10.3k 752 6881 2 clientreplay cephfs.naret-monitor02.exceuo 63.0k 6491 541 66 3 active cephfs.naret-monitor03.lqppte Reqs: 0 /s 16.7k 13.4k 8233 990 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 5888M 18.5T cephfs.cephfs.data data 119G 215T cephfs.cephfs.data.e_4_2 data 2289G 3241T cephfs.cephfs.data.e_8_3 data 9997G 470T STANDBY MDS cephfs.naret-monitor03.eflouf cephfs.naret-monitor01.uvevbf MDS version: ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable) “”” The file system is totally unresponsive (we can mount it on client nodes but any operations like a simple ls hangs). During the night we had a lot of mds crashes, I can share the content. Does anybody have an idea on how to tackle this problem? Best, Giuseppe _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx