Unfortunately, this cluster was setup before the calculator was in place and when the equation was not well understood. We have the storage space to move the pools and recreate them, which was apparently the only way to handle the issue( you are suggesting what appears to be a different approach ). I was hoping to avoid doing all of this because the migration would be very time consuming. There is no way to fix the stuck pg’s though? If I were to expand the replication to 3 instances, would that help with the PGs per OSD issue any? The math was originally based on 3 not the current 2. Sounds like it may change to 300 max which may not be helpful… When you say enforce, do you mean it will block all access to the cluster/OSDs? -Brent From: Janne Johansson [mailto:icepic.dz@xxxxxxxxx] 2018-01-05 6:56 GMT+01:00 Brent Kennedy <bkennedy@xxxxxxxxxx>:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is the issue. A temp workaround will be to bump the hard_ratio and perhaps restart the OSDs after (or add a ton of OSDs so the PG/OSD gets below 200) In your case, the osd max pg per osd hard ratio needs to go from 2.0 to 26.0 or above, which probably is rather crazy. The thing is that Luminous 12.2.2 starts enforcing this which previous versions didn't (at least not in the same way). Even if it is rather weird to run into this, you should have seen the warning before (even if it was > 300 previously) which also means you should perhaps have considered not upgrading when the cluster wasn't HEALTH_OK if it was warning about huge amount of PGs before going to 12.2.2. -- May the most significant bit of your life be positive. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com