PG stuck at active+clean+remapped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we have replaced some of our OSDs a while ago an while everything recovery as planned, one PG is still stuck at active+clean+remapped with no backfilling taking place.

Mpaaing the PG in question shows me that one OSD is missing:

$ ceph pg map 35.1fe
osdmap e1265760 pg 35.1fe (35.1fe) -> up [97,190,65,23,393,223,2147483647,354,132] acting [97,190,65,23,393,223,112,354,132]

It seems that osd.112 should be replaced with an other OSD and I suspect that CRUSH cannot find a suitable one.

Pool 35 is EC and has k=7 and m=2 and our Cluster has 9 OSD nodes. Is this just a case of CRUSH giving up to early as described in the troubleshooting PGs section[0] of the docs? Running the test as described there using `crushtool` gives several bad mapping rule results for "--num-rep 9".

If so, would it help to just add new OSDs to the existing hosts or would it be better to add a whole new OSD host?

Are there other options (upmap) to force this single PG to use a different set of OSDs for its "up" map?

[0] https://github.com/ceph/ceph/blob/master/doc/rados/troubleshooting/troubleshooting-pg.rst#crush-gives-up-too-soon

Thanks,
Michael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux