Hello Ceph Users, yesterday I had a defective Gbic in 1 node of my 10 node ceph cluster. The Gbic was working somehow but had 50% packet-loss. Some packets went through, some did not. What happend that the whole cluster did not service requests in time, there were lots of timeouts and so on until the problem was isolated. Monitors and osds where asked for data but did dot answer or answer late. I am wondering, here we have a highly redundant network setup and a highly redundant piece of software, but a small network fault brings down the whole cluster. Is there anything that can be configured or changed in ceph so that availability will become better in case of flapping networks ? I understand, it is not a ceph problem but a network problem but maybe something can be learned from such incidents ? Thanks Christoph -- Christoph Adomeit GATWORKS GmbH Reststrauch 191 41199 Moenchengladbach Sitz: Moenchengladbach Amtsgericht Moenchengladbach, HRB 6303 Geschaeftsfuehrer: Christoph Adomeit, Hans Wilhelm Terstappen Christoph.Adomeit@xxxxxxxxxxx Internetloesungen vom Feinsten Fon. +49 2166 9149-32 Fax. +49 2166 9149-10 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com