Sorry, I have not understood the problem well, the problem I see is that once the OSD fails, the cluster recovers but the MDS remains faulty: # ceph status cluster: id: c74da5b8-3d1b-483e-8b3a-739134db6cf8 health: HEALTH_WARN 3 clients failing to respond to capability release 2 MDSs report slow metadata IOs 2 MDSs report slow requests 2 MDSs behind on trimming Reduced data availability: 256 pgs inactive, 18 pgs down, 238 pgs incomplete 22 slow ops, oldest one blocked for 26719 sec, daemons [osd.134,osd.210,osd.244,osd.251,osd.301,osd.514,osd.520,osd.528,osd.642,osd.713]... have slow ops. services: mon: 3 daemons, quorum ceph2mon01,ceph2mon02,ceph2mon03 (age 23h) mgr: ceph2mon02(active, since 6d), standbys: ceph2mon01, ceph2mon03 mds: nxtclfs:2 {0=ceph2mon01=up:active,1=ceph2mon02=up:active} 1 up:standby osd: 768 osds: 736 up (since 7h), 736 in (since 7h) data: pools: 2 pools, 16384 pgs objects: 33.39M objects, 39 TiB usage: 64 TiB used, 2.6 PiB / 2.6 PiB avail pgs: 1.562% pgs not active 16128 active+clean 238 incomplete 18 down El 5/5/21 a las 11:00, Andres Rojas Guerrero escribió: > Yes, the principal problem is the MDS start to report slowly and the > information is no longer accessible, and the cluster never recover. > > > # ceph status > cluster: > id: c74da5b8-3d1b-483e-8b3a-739134db6cf8 > health: HEALTH_WARN > 2 clients failing to respond to capability release > 2 MDSs report slow metadata IOs > 1 MDSs report slow requests > 2 MDSs behind on trimming > Reduced data availability: 238 pgs inactive, 8 pgs down, 230 > pgs incomplete > Degraded data redundancy: 1400453/220552172 objects degraded > (0.635%), 461 pgs degraded, 464 pgs undersized > 241 slow ops, oldest one blocked for 638 sec, daemons > [osd.101,osd.127,osd.155,osd.166,osd.172,osd.189,osd.200,osd.210,osd.214,osd.233]... > have slow ops. > > services: > mon: 3 daemons, quorum ceph2mon01,ceph2mon02,ceph2mon03 (age 25h) > mgr: ceph2mon02(active, since 6d), standbys: ceph2mon01, ceph2mon03 > mds: nxtclfs:2 {0=ceph2mon01=up:active,1=ceph2mon02=up:active} 1 > up:standby > osd: 768 osds: 736 up (since 11m), 736 in (since 95s); 416 remapped pgs > > data: > pools: 2 pools, 16384 pgs > objects: 33.40M objects, 39 TiB > usage: 63 TiB used, 2.6 PiB / 2.6 PiB avail > pgs: 1.489% pgs not active > 1400453/220552172 objects degraded (0.635%) > 15676 active+clean > 285 active+undersized+degraded+remapped+backfill_wait > 230 incomplete > 176 active+undersized+degraded+remapped+backfilling > 8 down > 6 peering > 3 active+undersized+remapped > > El 5/5/21 a las 10:54, David Caro escribió: >> >> Can you share more information? >> >> The output of 'ceph status' when the osd is down would help, also 'ceph health detail' could be useful. >> >> On 05/05 10:48, Andres Rojas Guerrero wrote: >>> Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that >>> when some OSD go down the cluster doesn't start recover. I have checked >>> that the option noout is unset. >>> >>> What could be the reason for this behavior? >>> >>> >>> >>> -- >>> ******************************************************* >>> Andrés Rojas Guerrero >>> Unidad Sistemas Linux >>> Area Arquitectura Tecnológica >>> Secretaría General Adjunta de Informática >>> Consejo Superior de Investigaciones Científicas (CSIC) >>> Pinar 19 >>> 28006 - Madrid >>> Tel: +34 915680059 -- Ext. 990059 >>> email: a.rojas@xxxxxxx >>> ID comunicate.csic.es: @50852720l:matrix.csic.es >>> ******************************************************* >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > -- ******************************************************* Andrés Rojas Guerrero Unidad Sistemas Linux Area Arquitectura Tecnológica Secretaría General Adjunta de Informática Consejo Superior de Investigaciones Científicas (CSIC) Pinar 19 28006 - Madrid Tel: +34 915680059 -- Ext. 990059 email: a.rojas@xxxxxxx ID comunicate.csic.es: @50852720l:matrix.csic.es ******************************************************* _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx