On 09/17/2018 04:33 PM, Gregory Farnum wrote:
On Mon, Sep 17, 2018 at 8:21 AM Graham Allan <gta@xxxxxxx
<mailto:gta@xxxxxxx>> wrote:
Looking back through history it seems that I *did* override the
min_size
for this pool, however I didn't reduce it - it used to have min_size 2!
That made no sense to me - I think it must be an artifact of a very
early (hammer?) ec pool creation, but it pre-dates me.
I found the documentation on what min_size should be a bit confusing
which is how I arrived at 4. Fully agree that k+1=5 makes way more
sense.
I don't think I was the only one confused by this though, eg
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026445.html
I suppose the safest thing to do is update min_size->5 right away to
force any size-4 pgs down until they can perform recovery. I can set
force-recovery on these as well...
Mmm, this is embarrassing but that actually doesn't quite work due to
https://github.com/ceph/ceph/pull/24095, which has been on my task list
but at the bottom for a while. :( So if your cluster is stable now I'd
let it clean up and then change the min_size once everything is repaired.
Thanks for your feedback, Greg. Since declaring the dead osd as lost,
the downed pg became active again, and is successfully serving data. The
cluster is considerably more stable now; I've set force-backfill or
force-recovery on any size=4 pgs and can wait for that to complete
before changing anything else.
Thanks again,
Graham
--
Graham Allan
Minnesota Supercomputing Institute - gta@xxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com