Hi,
for IPoIB this is probably the only way to efficiently use dual-port
HCAs. Since IPoIB can - AFAIK - only do bonding in active-passive mode,
it won't distribute traffic across both ports like ethernet
link-aggregation would do.
OTOH, running ceph on dynamically routed networks will put your routing
daemon (e.g. bird) in a SPOF position...
Cheers, Bastian
Am 2016-06-06 15:29, schrieb Luis Periquito:
Nick,
TL;DR: works brilliantly :)
Where I work we have all of the ceph nodes (and a lot of other stuff)
using
OSPF and BGP server attachment. With that we're able to implement
solutions
like Anycast addresses, removing the need to add load balancers, for
the
radosgw solution.
The biggest issues we've had were around the per-flow vs per-packets
traffic load balancing, but as long as you keep it simple you shouldn't
have any issues.
Currently we have a P2P network between the servers and the ToR
switches on
a /31 subnet, and then create a virtual loopback address, which is the
interface we use for all communications. Running tests like iperf we're
able to reach 19Gbps (on a 2x10Gbps network). OTOH we no longer have
the
ability to separate traffic between public and osd network, but never
really felt the need for it.
Also spend a bit of time planning how the network will look like and
it's
topology. If done properly (think details like route summarization)
then
it's really worth the extra effort.
On Mon, Jun 6, 2016 at 11:57 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
Hi All,
Has anybody had any experience with running the network routed down
all
the way to the host?
I know the standard way most people configured their OSD nodes is to
bond
the two nics which will then talk via a VRRP gateway and then probably
from
then on the networking is all Layer3. The main disadvantage I see here
is
that you need a beefy inter switch link to cope with the amount of
traffic
flowing between switches to the VRRP address. I’ve been trying to
design
around this by splitting hosts into groups with different VRRP
gateways on
either switch, but this relies on using active/passive bonding on the
OSD
hosts to make sure traffic goes from the correct Nic to the directly
connected switch.
What I was thinking, instead of terminating the Layer3 part of the
network
at the access switches, terminate it at the hosts. If each Nic of the
OSD
host had a different subnet and the actual “OSD Server” address bound
to a
loopback adapter, OSPF should advertise this loopback adapter address
as
reachable via the two L3 links on the physically attached Nic’s. This
should give you a redundant topology which also will respect your
physically layout and potentially give you higher performance due to
ECMP.
Any thoughts, any pitfalls?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com