On 2020-08-22 03:19, Michael Thomas wrote: >> Yes, I do have different crush rules to help map certain types of data >> to different classes of hardware (EC HDDs, replicated SSDs, replicated >> nvme). The default crush rule for the device_health_metrics pool was >> to use replication across any storage device. I changed it to use the >> replicated nvme crush rule, and now the map looks different: >> >> # ceph pg map 1.0 >> osdmap e7256 pg 1.0 (1.0) -> up [24,22,12] acting [41,0] >> >> However, the acting set of OSDs has not changed. > > A little more info: > > ceph status is reporting a slow OSD, which happens to be the primary OSD > for the offending PG: > > health: HEALTH_WARN > 1 pools have many more objects per pg than average > 1 backfillfull osd(s) > 2 nearfull osd(s) > Reduced data availability: 1 pg inactive > 304 pgs not deep-scrubbed in time > 2 pool(s) backfillfull > 2294 slow ops, oldest one blocked for 1122032 sec, osd.41 > has slow ops According to documentation: "Up Set The ordered list of OSDs responsible for a particular placement group for a particular epoch according to CRUSH. Normally this is the same as the Acting Set, except when the Acting Set has been explicitly overridden via pg_temp in the OSD Map." Is that indeed the case? Can you see this in the osdmap (ceph osd dump |grep pg_temp)? Are there any upmaps for that PG, i.e. ceph osd dump | grep upmap | grep "1\.0"? If there is one you can try to remove it (ceph osd rm-pg-upmap-items 1.0). If there isn't one you can set one accordingly to your crush rule policy, i.e.: ceph osd pg-upmap-items 1.0 24 some_other_osd We would have a similar issue (hitting a bug in mimic which is now fixed) where this could happen to some PGs. We would then upmap the PG (and later remove it) to fix it. Maybe it works for this as well. Gr. Stefan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx