Quoting Brent Kennedy (bkennedy@xxxxxxxxxx): > Unfortunately, this cluster was setup before the calculator was in > place and when the equation was not well understood. We have the > storage space to move the pools and recreate them, which was > apparently the only way to handle the issue( you are suggesting what > appears to be a different approach ). I was hoping to avoid doing all > of this because the migration would be very time consuming. There is > no way to fix the stuck pg’s though? If I were to expand the > replication to 3 instances, would that help with the PGs per OSD issue > any? No! It will make the problem worse because you need PGs to host these copies. The more replicas, the more PGs you need. > The math was originally based on 3 not the current 2. Sounds > like it may change to 300 max which may not be helpful… > When you say enforce, do you mean it will block all access to the cluster/OSDs? No, you will not be able to increase the number of PGs on the pool. > > We have upgraded from Hammer to Jewel and then Luminous 12.2.2 as of > today. During the hammer upgrade to Jewel we lost two host servers > and let the cluster rebalance/recover, it ran out of space and > stalled. We then added three new host servers and then let the > cluster rebalance/recover. During that process, at some point, we > ended up with 4 pgs not being able to be repaired using “ceph pg > repair xx.xx”. I tried using ceph pg 11.720 query and from what I can > tell the missing information matches, but is being blocked from being > marked clean. I keep seeing references to the ceph-object-store tool > to use as an export/restore method, but I cannot find details on a > step by step process given the current predicament. It may also be > possible for us to just lose the data if it cant be extracted so we > can at least return the cluster to a healthy state. Any thoughts? What is the output of "ceph daemon osd.$ID config show | grep osd_allow_recovery_below_min_size If you are below min_size recovery will not complete when that setting is not true. Maybe this thread is interesting: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005613.html Especially when a OSD is candidate for backfilling a target but does not contain any data. Gr. Stefan -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com