Re: Defective Gbic brings whole Cluster down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

This has been discussed a few times. The consensus seems to be to make
sure error rates of NICs or other such metrics are included in your
monitoring solution. It would also be good to preform periodic network
tests like a full size ping with nofrag set between all nodes and have
your monitoring solution report that as well.

Although I would like to see such a feature in Ceph, the concern is
that such a feature can quickly get out of hand and that something
else that is really designed for it should do it. I can understand
where they are coming from in that regard, but having Ceph kick out a
misbehaving node quickly is appealing as well (there would have to be
a way to specify that only so many nodes could be kicked out).
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Aug 27, 2015 at 9:37 AM, Christoph Adomeit  wrote:
> Hello Ceph Users,
>
> yesterday I had a defective Gbic in 1 node of my 10 node ceph cluster.
>
> The Gbic was working somehow but had 50% packet-loss. Some packets went through, some did not.
>
> What happend that the whole cluster did not service requests in time, there were lots of timeouts and so on
> until the problem was isolated. Monitors and osds where asked for data but did dot answer or answer late.
>
> I am wondering, here we have a highly redundant network setup and a highly redundant piece of software, but a small
> network fault brings down the whole cluster.
>
> Is there anything that can be configured or changed in ceph so that availability will become better in case of flapping networks ?
>
> I understand, it is not a ceph problem but a network problem but maybe something can be learned from such incidents  ?
>
> Thanks
>   Christoph
> --
> Christoph Adomeit
> GATWORKS GmbH
> Reststrauch 191
> 41199 Moenchengladbach
> Sitz: Moenchengladbach
> Amtsgericht Moenchengladbach, HRB 6303
> Geschaeftsfuehrer:
> Christoph Adomeit, Hans Wilhelm Terstappen
>
> Christoph.Adomeit@xxxxxxxxxxx     Internetloesungen vom Feinsten
> Fon. +49 2166 9149-32                      Fax. +49 2166 9149-10
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.0.2
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJV31pFCRDmVDuy+mK58QAA7qwQAL0EvbHneC00qhCX/jjT
Xl8whWvQgm/UUDEPAWe2wGkgVZtP3cSAx/p+IkusZuD6NClIiWvazdz5n+vf
cj4Y+S8Zj4Lw7gypHjy5GSCDSbQnEni32QNKp74GM/EZ1331gXuDvP0bS2Sz
7g5MXu8Vpf0Kdrj8JrOPnHY1PtljxkQXdrEmijDkmnjruO+XGFQrl8l9GFbN
enFZI+PpEAoSEJPZosCnX+ZLM3/ZiwAfAPtvcARyDwdmjV7CjyRjVviloR3K
DV/b+VuWX+NVzTZMKCnILVubt1Khexzk6reU3m7Yjy713dmEehDmKQsESFci
pMi61iEuxje0O+iqOp+mhhYWtv+Iv7bbpHcGv04vfMsl6+ms6v/EHo/Cccoi
ZiOa+xD6l7ZkO+A+2bvunBvC3cjBFXn8yrNpHDj6G+jUWMDuJcs7wAhExhPv
Qicjhzk9AoTFXPIkfkGnuHJ/ngFnswdHeVa1DU7GV+Evh/2BCtoHH7Ur+XQY
u7gL6LXt+2UAB3+ZIEvr2NOAFiIVsPqnGqQqNiNz5XQDFh5bD3e1iScucZbm
VNStBkWDoDwrBYVe74cN55ZXA5auTSDYuYlen+BPbYhAKmpkBp+Suv1H4CFy
01cnANvJfbaxoBIPLzvhdx4c73Qd+J6ttxi2g8u8EedXDbPIYGFPy2madvtW
JNPc
=3sV8
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux