Hi,
we have replaced some of our OSDs a while ago an while everything
recovery as planned, one PG is still stuck at active+clean+remapped with
no backfilling taking place.
Mpaaing the PG in question shows me that one OSD is missing:
$ ceph pg map 35.1fe
osdmap e1265760 pg 35.1fe (35.1fe) -> up
[97,190,65,23,393,223,2147483647,354,132] acting
[97,190,65,23,393,223,112,354,132]
It seems that osd.112 should be replaced with an other OSD and I suspect
that CRUSH cannot find a suitable one.
Pool 35 is EC and has k=7 and m=2 and our Cluster has 9 OSD nodes. Is
this just a case of CRUSH giving up to early as described in the
troubleshooting PGs section[0] of the docs? Running the test as
described there using `crushtool` gives several bad mapping rule results
for "--num-rep 9".
If so, would it help to just add new OSDs to the existing hosts or would
it be better to add a whole new OSD host?
Are there other options (upmap) to force this single PG to use a
different set of OSDs for its "up" map?
[0]
https://github.com/ceph/ceph/blob/master/doc/rados/troubleshooting/troubleshooting-pg.rst#crush-gives-up-too-soon
Thanks,
Michael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx