Re: ceph Nautilus lost two disk over night everything hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

in between ceph is runing again normally, except for the two osds that are down because of the failed disks.

What really helped in my situation was to lower min_size from 5 (k+1) to 4 in my 4+2 erasure code setup. So I am also greatful for the programmer who put the helping hint in ceph health detail for this situation.

Thanks very much to everyone who answered my request to help out.

What is left now is to replace the disks and then bring the two osds up again.

Have a nice day
Rainer

Am 30.03.21 um 13:32 schrieb Burkhard Linke:
Hi,

On 30.03.21 13:05, Rainer Krienke wrote:
Hello,

yes your assumptions are correct pxa-rbd ist the metadata pool for pxa-ec which uses a erasure coding 4+2 profile.

In the last hours ceph repaired most of the damage. One inactive PG remained and in ceph health detail then told me:

---------
HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg incomplete; 15 daemons have recently crashed; 150 slow ops, oldest one blocked for 26716 sec, daemons [osd.60,osd.67] have slow ops.
PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg incomplete
    pg 36.15b is remapped+incomplete, acting [60,2147483647,23,96,2147483647,36] (reducing pool pxa-ec min_size from 5 may help; search ceph.com/docs for 'incomplete')


*snipsnap*

2147483647 is (uint32)(-1), which mean no associated OSD. So this PG does not have six independent OSDs, and no backfilling is happening since there are no targets to backfill.


You mentioned 9 hosts, so if you use a simple host based crush rule ceph should be able to find new OSDs for that PG. If you do not use standard crush rules please check that ceph is able to derive enough OSDs to satisfy the PG requirements (six different OSDs).


The 'incomplete' part might be a problem. If just a chunk would be missing, the state should be undersized, not incomplete...


Regards,

Burkhard

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux