Hi list,
During my test of ceph,I find sometime the whole
ceph cluster are blocked and the reason was one unfunctional osd.Ceph can
heal itself if some osd is down, but it seems if some osd is half dead (have
heart beat but can't handle request) then all the request which are directed to
that osd would be blocked. If all osds are in one pool and the whole cluster
would be blocked due to that one hanged osd.
I think this is because ceph will try to
distribute the request to all osds and if one of the osd wont confirm the
request is done then everything is blocked.
Is there a way to let ceph to mark the the
crippled osd down if the requests direct to that osd are blocked more than
certain time to avoid the whole cluster is blocked?
2018-03-04
shadow_lin |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com