Re: ceph Nautilus lost two disk over night everything hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 30.03.21 13:05, Rainer Krienke wrote:
Hello,

yes your assumptions are correct pxa-rbd ist the metadata pool for pxa-ec which uses a erasure coding 4+2 profile.

In the last hours ceph repaired most of the damage. One inactive PG remained and in ceph health detail then told me:

---------
HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg incomplete; 15 daemons have recently crashed; 150 slow ops, oldest one blocked for 26716 sec, daemons [osd.60,osd.67] have slow ops.
PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg incomplete
    pg 36.15b is remapped+incomplete, acting [60,2147483647,23,96,2147483647,36] (reducing pool pxa-ec min_size from 5 may help; search ceph.com/docs for 'incomplete')


*snipsnap*

2147483647 is (uint32)(-1), which mean no associated OSD. So this PG does not have six independent OSDs, and no backfilling is happening since there are no targets to backfill.


You mentioned 9 hosts, so if you use a simple host based crush rule ceph should be able to find new OSDs for that PG. If you do not use standard crush rules please check that ceph is able to derive enough OSDs to satisfy the PG requirements (six different OSDs).


The 'incomplete' part might be a problem. If just a chunk would be missing, the state should be undersized, not incomplete...


Regards,

Burkhard

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux