My big worry is about, when a single link under a bond breaks, it breaks hardly such that the whole bond does not work. How to make it "failover" in such cases? best regards, samuel huxiaoyu@xxxxxxxxxxxx From: Anthony D'Atri Date: 2021-06-15 18:22 To: huxiaoyu@xxxxxxxxxxxx Subject: Re: Issues with Ceph network redundancy using L2 MC-LAG Which hash mode are you using on the hosts? layer 3+4 ? Are they set up active/active, or active/passive? I often see suboptimal bonding configurations that result in most or all traffic going over only one ilnk. > On Jun 15, 2021, at 9:19 AM, huxiaoyu@xxxxxxxxxxxx wrote: > > Dear Cephers, > > I encountered the following networking issue several times, and i wonder whether there is a solution for networking HA solution. > > We build ceph using L2 multi chassis link aggregation group (MC-LAG ) to provide switch redundancy. On each host, we use 802.3ad, LACP > mode for NIC redundancy. However, we observe several times, when a single network port, either the cable, or the SFP+ optical module fails, Ceph cluster is badly affected by networking, although in theory it should be able to tolerate. > > Did i miss something important here? and how to really achieve networking HA in Ceph cluster? > > best regards, > > Samuel > > > > > huxiaoyu@xxxxxxxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx