Re: Requests blocked as cluster is unaware of dead OSDs for quite a long time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 03/27/2018 12:58 AM, Jared H wrote:
> I have three datacenters with three storage hosts in each, which house
> one OSD/MON per host. There are three replicas, one in each datacenter.
> I want the cluster to be able to survive a nuke dropped on 1/3
> datacenters, scaling up to 2/5 datacenters. I do not need realtime data
> replication (Ceph is already fast enough), but I do need decently
> realtime fault tolerance such that requests are blocked for ideally less
> than 10 seconds.
> 
> In testing, I kill networking on 3 hosts and the cluster becomes
> unresponsive for 1-5 minutes as requests are blocked. The monitors are
> detected as down within 15-20 seconds, but OSD take a long time to
> change state to 'down'.
> > I have played with these timeout and heartbeat options but they don't
> seem to have any effect:
> [osd]
> osd_heartbeat=3
> osd_heartbeat_grace=9
> osd_mon_heartbeat_interval=3
> osd_mon_report_interval_min=3
> osd_mon_report_interval_max=9
> osd_mon_ack_timeout=9
> 
> Is it the nature of the networking failure? I can pkill ceph-osd to
> simulate a software failure and they are detected as down almost instantly.
> 

when you kill the OSD the other OSDs will get a 'connection refused' and
can declare the OSD down immediately. But when you kill the network
things start to timeout.

It's hard to judge from the outside what exactly happens, but keep in
mind, Ceph is designed with data consistency as the number 1 priority.
It will choose safety of data over availability. So if it's not sure
what is happening I/O will block.

Wido

> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux