Re: Handling of network failures in the cluster network

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 13, 2014 at 11:32 AM, Martin Mailand <martin@xxxxxxxxxxxx> wrote:
> Hi List,
>
> I have a ceph cluster setup with two networks, one for public traffic
> and one for cluster traffic.
> Network failures in the public network are handled quite well, but
> network failures in the cluster network are handled very badly.
>
> I found several discussions on the ml about this topic and they stated
> that the problem should be fixed, but I still have problems.
>
> I use ceph v0.86 with a standard crushmap, 4 osds per host and 6 hosts
> in the root default therefore I have 24 osds overall.
> Each storage node has 2 10Gbit nics one for public and one for cluster
> traffic, if I take down ONE of the links in the cluster network the
> cluster stops working.
>
> I tested it several times and I could observe following different behaviors.
>
> 1. Cluster stops forever.
>
> 2. After a timeout of around 120 seconds all other osds gets marked
> down. The osds on the storage node with the link failure stays up. Then
> all other osds boot and come back and the osds on the node with the
> failure are marked down and the cluster starts to work again.
>
> 3. After a timeout of around 120 seconds the osds on the node with the
> link failure gets marked down and the cluster starts to work again.
>
> Therefore a link failure in the cluster network has a very severe impact
> on the cluster availability.
>
> Is this a configuration mistake on my side?
>
> Any help would be greatly appreciated.

How did you test taking down the connection?
What config options have you specified on the OSDs and in the monitor?

None of the scenarios you're describing make much sense on a
semi-recent (post-dumpling-release) version of Ceph.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux