I would set min_size back to 2 for general running, but put it down to 1 during planned maintenance. There are a lot of threads on the ML talking about why you shouldn't run with min_size of 1.
On Thu, Aug 10, 2017, 11:36 PM Hyun Ha <hfamily15@xxxxxxxxx> wrote:
Thanks for reply.In my case, it was an issue about min_size of pool.# ceph osd pool ls detailpool 5 'volumes' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 844 flags hashpspool stripe_width 0removed_snaps [1~23]when replicated size=2 and min_size=2 is set, and osd goes down, ceph cluster go into Err state and client I/O goes hang.ceph status log>health HEALTH_ERR310 pgs are stuck inactive for more than 300 seconds35 pgs backfill_wait3 pgs backfilling38 pgs degraded382 pgs peering310 pgs stuck inactive310 pgs stuck unclean39 pgs undersized263 requests are blocked > 32 secyou can simply reproduce that.so I solved this by set min_size=1 using "ceph osd pool set volumes min_size 1" command.It is very strange thing because if min_size can occurs big problem to ceph cluster, ceph would not allow to set same value with replicated_size.Thanks.2017-08-10 23:33 GMT+09:00 David Turner <drakonstein@xxxxxxxxx>:When the node remote, are the osds being marked down immediately? If the node were to reboot, but not Mark the osds down, then all requires to those osds would block until they got marked down.
On Thu, Aug 10, 2017, 5:46 AM Hyun Ha <hfamily15@xxxxxxxxx> wrote:_______________________________________________Hi, RamirezI have exactly same problem as yours.Did you solved that issue?Do you have expireences or solutions?Thank you.
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com