Re: Network redundancy pro and cons, best practice, suggestions?

Scott Laird <scott@xxxxxxxxxxx> · Mon, 13 Apr 2015 15:07:04 +0000

Redundancy is a means to an end, not an end itself.

If you can afford to lose component X, manually replace it, and then return everything impacted to service, then there's no point in making X redundant.

If you can afford to lose a single disk (which Ceph certainly can), then there's no point in local RAID.

If you can afford to lose a single machine, then there's no point in redundant power supplies (although they can make power maintenance work a lot less complex).

If you can afford to lose everything attached to a switch, then there's no point in making it redundant.

Doing redundant networking to the host adds a lot of complexity that isn't really there with single-attached hosts.  For instance, what happens if one of the switches loses its connection to the outside world?  With LACP, you'll probably lose connectivity to half of your peers.  Doing something like OSPF, possibly with ECMP, avoids that problem, but certainly doesn't make things less complicated.

In most cases, I'd avoid switch redundancy.  If I had more than 10 racks, there's really no point, because you should be able to lose a rack without massive disruption.  If I only had a rack or two, than I quite likely wouldn't bother, simply because it ends up being a bigger part of the cost and the added complexity and cost isn't worth it in most cases.

It comes down to engineering tradeoffs and money, and the right balance is different in just about every situation.  It's a function of money, acceptance of risk, scale, performance, networking experience, and the cost of outages.

Scott

On Mon, Apr 13, 2015 at 4:02 AM Christian Balzer <chibi@xxxxxxx> wrote:

Hello,

On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote:

> Dear ceph users,

>

> we are planing a ceph storage cluster from scratch. Might be up to 1 PB

> within the next 3 years, multiple buildings, new network infrastructure

> for the cluster etc.

>

> I had some excellent trainings on ceph, so the essential fundamentals

> are familiar to me, and I know our goals/dreams can be reached. :)

>

> There is just "one tiny piece" in the design I'm currently unsure

> about :)

>

> Ceph follows some sort of keep it small and simple, e.g. dont use raid

> controllers, use more boxes and disks, fast network etc.

>

While small and plenty is definitely true, some people actually use RAID

for OSDs (like RAID1) to avoid ever having to deal with a failed OSD and

getting a 4x replication in the end.

Your needs and budget may of course differ.

> So from our current design we plan 40Gb Storage and Client LAN.

>

> Would you suggest to connect the OSD nodes redundant to both networks?

> That would end up with 4 * 40Gb ports in each box, two Switches to

> connect to.

>

If you can afford it, fabric switches are quite nice, as they allow for

LACP over 2 switches, so if everything is working you get twice the speed,

if not still full redundancy. The Brocade VDX stuff comes to mind.

However if you're not tied into an Ethernet network, you might do better

and cheaper with an Infiniband network on the storage side of things.

This will become even more attractive as RDMA support improves with Ceph.

Separating public (client) and private (storage, OSD interconnect)

networks with Ceph makes only sense if your storage node can actually

utilize all that bandwidth.

So at your storage node density of 12 HDDs (16 HDD chassis are not space

efficient), 40GbE is overkill with a single link/network, insanely so with

2 networks.

> I'd think of OSD nodes with 12 - 16 * 4TB SATA disks for "high" io

> pools. (+ currently SSD for journal, but may be until we start, levelDB,

> rocksDB are ready ... ?)

>

> Later some less io bound pools for data archiving/backup. (bigger and

> more Disks per node)

>

> We would also do some Cache tiering for some pools.

>

> From HP, Intel, Supermicron etc reference documentations, they use

> usually non-redundant network connection. (single 10Gb)

>

> I know: redundancy keeps some headaches small, but also adds some more

> complexity and increases the budget. (add network adapters, other

> server, more switches, etc)

>

Complexity not so much, cost yes.

> So what would you suggest, what are your experiences?

>

It all depends on how small (large really) you can start.

I have only small clusters with few nodes, so for me redundancy is a big

deal.

Thus those cluster use Infiniband, 2 switches and dual-port HCAs on the

nodes in an active-standby mode.

If you however can start with something like 10 racks (ToR switches),

loosing one switch would mean a loss of 10% of your cluster, which is

something it should be able to cope with.

Especially if you configured Ceph to _not_ start re-balancing data

automatically if a rack goes down (so that you have a chance to put a

replacement switch in place, which you of course kept handy on-site for

such a case). ^.-

Regards,

Christian

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx           Global OnLine Japan/Fusion Communications

http://www.gol.com/

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com