On 03/27/2018 12:58 AM, Jared H wrote: > I have three datacenters with three storage hosts in each, which house > one OSD/MON per host. There are three replicas, one in each datacenter. > I want the cluster to be able to survive a nuke dropped on 1/3 > datacenters, scaling up to 2/5 datacenters. I do not need realtime data > replication (Ceph is already fast enough), but I do need decently > realtime fault tolerance such that requests are blocked for ideally less > than 10 seconds. > > In testing, I kill networking on 3 hosts and the cluster becomes > unresponsive for 1-5 minutes as requests are blocked. The monitors are > detected as down within 15-20 seconds, but OSD take a long time to > change state to 'down'. > > I have played with these timeout and heartbeat options but they don't > seem to have any effect: > [osd] > osd_heartbeat=3 > osd_heartbeat_grace=9 > osd_mon_heartbeat_interval=3 > osd_mon_report_interval_min=3 > osd_mon_report_interval_max=9 > osd_mon_ack_timeout=9 > > Is it the nature of the networking failure? I can pkill ceph-osd to > simulate a software failure and they are detected as down almost instantly. > when you kill the OSD the other OSDs will get a 'connection refused' and can declare the OSD down immediately. But when you kill the network things start to timeout. It's hard to judge from the outside what exactly happens, but keep in mind, Ceph is designed with data consistency as the number 1 priority. It will choose safety of data over availability. So if it's not sure what is happening I/O will block. Wido > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com