Re: reducing min_size on erasure coded pool may allow recovery ?

David Turner <drakonstein@xxxxxxxxx> · Mon, 29 Oct 2018 22:43:35 -0400

min_size should be at least k+1 for EC. There are times to use k for emergencies like you had. I would suggest seeing it back to 3 once your back to healthy.
As far as why you needed to reduce min_size, my guess would be that recovery would have happened as long as k copies were up. Were the PG's refusing to backfill or just hang backfilled yet?

On Mon, Oct 29, 2018, 9:24 PM Chad W Seys <cwseys@xxxxxxxxxxxxxxxx> wrote:
Hi all,

   Recently our cluster lost a drive and a node (3 drives) at the same 

time.  Our erasure coded pools are all k2m2, so if all is working 

correctly no data is lost.

   However, there were 4 PGs that stayed "incomplete" until I finally 

took the suggestion in 'ceph health detail' to reduce min_size . (Thanks 

for the hint!)  I'm not sure what it was (likely 3), but setting it to 2 

caused all PGs to become active (though degraded) and the cluster is on 

path to recovering fully.

   In replicated pools, would not ceph create replicas without the need 

to reduce min_size?  It seems odd to not recover automatically if 

possible.  Could someone explain what was going on there?

   Also, how to decide what min_size should be?

Thanks!

Chad.

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com