Thanks for reply.
In my case, it was an issue about min_size of pool.
# ceph osd pool ls detail
pool 5 'volumes' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 844 flags hashpspool stripe_width 0
removed_snaps [1~23]
when replicated size=2 and min_size=2 is set, and osd goes down, ceph cluster go into Err state and client I/O goes hang.
ceph status log>
health HEALTH_ERR
310 pgs are stuck inactive for more than 300 seconds
35 pgs backfill_wait
3 pgs backfilling
38 pgs degraded
382 pgs peering
310 pgs stuck inactive
310 pgs stuck unclean
39 pgs undersized
263 requests are blocked > 32 sec
you can simply reproduce that.
so I solved this by set min_size=1 using "ceph osd pool set volumes min_size 1" command.
It is very strange thing because if min_size can occurs big problem to ceph cluster, ceph would not allow to set same value with replicated_size.
Thanks.
2017-08-10 23:33 GMT+09:00 David Turner <drakonstein@xxxxxxxxx>:
When the node remote, are the osds being marked down immediately? If the node were to reboot, but not Mark the osds down, then all requires to those osds would block until they got marked down.
On Thu, Aug 10, 2017, 5:46 AM Hyun Ha <hfamily15@xxxxxxxxx> wrote:______________________________Hi, RamirezI have exactly same problem as yours.Did you solved that issue?Do you have expireences or solutions?Thank you._________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com