Re: Reduced data availability: 4 pgs inactive, 4 pgs incomplete

Janne Johansson <icepic.dz@xxxxxxxxx> · Fri, 5 Jan 2018 09:33:59 +0100

2018-01-05 6:56 GMT+01:00 Brent Kennedy <bkennedy@xxxxxxxxxx>:
We have upgraded from Hammer to Jewel and then Luminous 12.2.2 as of today.  During the hammer upgrade to Jewel we lost two host servers and let the cluster rebalance/recover, it ran out of space and stalled.  We then added three new host servers and then let the cluster rebalance/recover. During that process, at some point, we ended up with 4 pgs not being able to be repaired using “ceph pg repair xx.xx”.  I tried using ceph pg 11.720 query and from what I can tell the missing information matches, but is being blocked from being marked clean.  I keep seeing references to the ceph-object-store tool to use as an export/restore method, but I cannot find details on a step by step process given the current predicament.  It may also be possible for us to just lose the data if it cant be extracted so we can at least return the cluster to a healthy state.  Any thoughts?

Ceph –s output:

cluster:
    health: HEALTH_ERR
            Reduced data availability: 4 pgs inactive, 4 pgs incomplete
            Degraded data redundancy: 4 pgs unclean
            4 stuck requests are blocked > 4096 sec
            too many PGs per OSD (2549 > max 200)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is the issue.
A temp workaround will be to bump the hard_ratio and perhaps restart the OSDs after (or add a ton of OSDs so the
PG/OSD gets below 200) 
In your case, the
    osd max pg per osd hard ratio
needs to go from 2.0 to 26.0 or above, which probably is rather crazy.

The thing is that Luminous 12.2.2 starts enforcing this which previous versions didn't (at least not in the same way).

Even if it is rather weird to run into this, you should have seen the warning before (even if it was > 300 previously) which also means
you should perhaps have considered not upgrading when the cluster wasn't HEALTH_OK if it was warning about huge amount of PGs
before going to 12.2.2.

-- 
May the most significant bit of your life be positive.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com