Re: Ceph cluster not recover after OSD down

Andres Rojas Guerrero <a.rojas@xxxxxxx> · Wed, 5 May 2021 11:44:40 +0200

I have in the cluster 768 OSD, it is enough that 32 (~ 4%) of them (in
the same node) fall and the information becomes inaccessible. Is it
possible to improve this behavior?

# ceph status
  cluster:
    id:     c74da5b8-3d1b-483e-8b3a-739134db6cf8
    health: HEALTH_WARN
            1 clients failing to respond to capability release
            1 MDSs report slow metadata IOs
            1 MDSs report slow requests
            32 osds down
            1 host (32 osds) down
            Reduced data availability: 238 pgs inactive, 8 pgs down, 230
pgs incomplete
            Degraded data redundancy: 7384199/222506256 objects degraded
(3.319%), 2925 pgs degraded, 2925 pgs undersized

El 5/5/21 a las 11:20, Andres Rojas Guerrero escribió:
> They are located on a single node ...
> 
> El 5/5/21 a las 11:17, Burkhard Linke escribió:
>> Hi,
>>
>> On 05.05.21 11:07, Andres Rojas Guerrero wrote:
>>> Sorry, I have not understood the problem well, the problem I see is that
>>> once the OSD fails, the cluster recovers but the MDS remains faulty:
>>
>>
>> *snipsnap*
>>
>>>      pgs:     1.562% pgs not active
>>>               16128 active+clean
>>>               238   incomplete
>>>               18    down
>>
>>
>> The PGs in down and incomplete state will not allow any I/O, and this
>> leads to the slow ops and the unavailability of the services. 32 OSDs
>> are currently down; if PG replicates are spread over these OSDs only
>> there will be no automatic recover.
>>
>>
>> You will have to bring the OSDs back online to allow recovery. Are those
>> located on a single node or are multiple hosts involved?
>>
>>
>> Regards,
>>
>> Burkhard
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 

-- 
*******************************************************
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.rojas@xxxxxxx
ID comunicate.csic.es: @50852720l:matrix.csic.es
*******************************************************
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx