Re: ceph osd purge => 4 PGs stale+undersized+degraded+peered

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

My 4 PGs are now active !
David Casier from aevoo.fr has succeeded in mounting the OSD I had recently purged.

The problem seems to be a bug in the number of retries in the Ceph crushmap.
In fact, I thought my PGs were replicated in 3 rooms, but it was not the case.
I run ceph 15.2.15

Here is the crushmap which do not had replicated the PGs over the 3 rooms (bug ??) :
rule 3replicats3sites_rule {
    id 2
    type replicated
    min_size 2
    max_size 3
    step take default
    step choose firstn 3 type room
    step chooseleaf firstn 4 type host
    step emit
}

And here is the rule which has been corrected, and now distributes efficiently the PGs over the 3 rooms :
rule 3replicats3sites_rule {
    id 2
    type replicated
    min_size 2
    max_size 3
    step take default
    step choose firstn 0 type room
    step chooseleaf firstn 1 type host
    step emit
}

Thank you for your answers.

Rafael


Le 17/01/2022 à 15:24, Rafael Diaz Maurin a écrit :
Hello,

All my pools on the cluster are replicated (x3).

I purged some OSD (after I stopped them) and remove the disks from the servers, and now I have 4 PGs in stale+undersized+degraded+peered.

Reduced data availability: 4 pgs inactive, 4 pgs stale

pg 1.561 is stuck stale for 39m, current state stale+undersized+degraded+peered, last acting [64] pg 1.af2 is stuck stale for 39m, current state stale+undersized+degraded+peered, last acting [63] pg 3.3 is stuck stale for 39m, current state stale+undersized+degraded+peered, last acting [48] pg 9.5ca is stuck stale for 38m, current state stale+undersized+degraded+peered, last acting [49]


Those 4 OSDs have been purged, so ther aren't anymore in the crushmap.

I tried a pg repair :
ceph pg repair 1.561
ceph pg repair 1.af2
ceph pg repair 3.3
ceph pg repair 9.5ca


The PGs are remapped but non of the degraded objects have been repaired

ceph pg map 9.5ca
osdmap e355782 pg 9.5ca (9.5ca) -> up [54,75,82] acting [54,75,82]
ceph pg map 3.3
osdmap e355782 pg 3.3 (3.3) -> up [179,180,107] acting [179,180,107]
ceph pg map 1.561
osdmap e355785 pg 1.561 (1.561) -> up [70,188,87] acting [70,188,87]
ceph pg map 1.af2
osdmap e355789 pg 1.af2 (1.af2) -> up [189,74,184] acting [189,74,184]

How can I succeed in reparing my 4 PGs ?

This affect the cephfs-metadata pool, and the filesystem is degraded because the rank0 mds node stuck in rejoin state.


Thank you.

Rafael



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


--
Rafael Diaz Maurin
DSI de l'Université de Rennes 1
Pôle Infrastructures, équipe Systèmes
02 23 23 71 57

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux