Hello,
My 4 PGs are now active !
David Casier from aevoo.fr has succeeded in mounting the OSD I had
recently purged.
The problem seems to be a bug in the number of retries in the Ceph crushmap.
In fact, I thought my PGs were replicated in 3 rooms, but it was not the
case.
I run ceph 15.2.15
Here is the crushmap which do not had replicated the PGs over the 3
rooms (bug ??) :
rule 3replicats3sites_rule {
id 2
type replicated
min_size 2
max_size 3
step take default
step choose firstn 3 type room
step chooseleaf firstn 4 type host
step emit
}
And here is the rule which has been corrected, and now distributes
efficiently the PGs over the 3 rooms :
rule 3replicats3sites_rule {
id 2
type replicated
min_size 2
max_size 3
step take default
step choose firstn 0 type room
step chooseleaf firstn 1 type host
step emit
}
Thank you for your answers.
Rafael
Le 17/01/2022 à 15:24, Rafael Diaz Maurin a écrit :
Hello,
All my pools on the cluster are replicated (x3).
I purged some OSD (after I stopped them) and remove the disks from the
servers, and now I have 4 PGs in stale+undersized+degraded+peered.
Reduced data availability: 4 pgs inactive, 4 pgs stale
pg 1.561 is stuck stale for 39m, current state
stale+undersized+degraded+peered, last acting [64]
pg 1.af2 is stuck stale for 39m, current state
stale+undersized+degraded+peered, last acting [63]
pg 3.3 is stuck stale for 39m, current state
stale+undersized+degraded+peered, last acting [48]
pg 9.5ca is stuck stale for 38m, current state
stale+undersized+degraded+peered, last acting [49]
Those 4 OSDs have been purged, so ther aren't anymore in the crushmap.
I tried a pg repair :
ceph pg repair 1.561
ceph pg repair 1.af2
ceph pg repair 3.3
ceph pg repair 9.5ca
The PGs are remapped but non of the degraded objects have been repaired
ceph pg map 9.5ca
osdmap e355782 pg 9.5ca (9.5ca) -> up [54,75,82] acting [54,75,82]
ceph pg map 3.3
osdmap e355782 pg 3.3 (3.3) -> up [179,180,107] acting [179,180,107]
ceph pg map 1.561
osdmap e355785 pg 1.561 (1.561) -> up [70,188,87] acting [70,188,87]
ceph pg map 1.af2
osdmap e355789 pg 1.af2 (1.af2) -> up [189,74,184] acting [189,74,184]
How can I succeed in reparing my 4 PGs ?
This affect the cephfs-metadata pool, and the filesystem is degraded
because the rank0 mds node stuck in rejoin state.
Thank you.
Rafael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Rafael Diaz Maurin
DSI de l'Université de Rennes 1
Pôle Infrastructures, équipe Systèmes
02 23 23 71 57
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx