We have upgraded from Hammer to Jewel and then Luminous 12.2.2 as of today. During the hammer upgrade to Jewel we lost two host servers and let the cluster rebalance/recover, it ran out of space and stalled. We then added three new host servers and then let the cluster rebalance/recover. During that process, at some point, we ended up with 4 pgs not being able to be repaired using “ceph pg repair xx.xx”. I tried using ceph pg 11.720 query and from what I can tell the missing information matches, but is being blocked from being marked clean. I keep seeing references to the ceph-object-store tool to use as an export/restore method, but I cannot find details on a step by step process given the current predicament. It may also be possible for us to just lose the data if it cant be extracted so we can at least return the cluster to a healthy state. Any thoughts?
Ceph –s output:
cluster:
health: HEALTH_ERR
Reduced data availability: 4 pgs inactive, 4 pgs incomplete
Degraded data redundancy: 4 pgs unclean
4 stuck requests are blocked > 4096 sec
too many PGs per OSD (2549 > max 200)
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com