We have space limitations in our DCs and so have to build as densely as possibly. These clusters are two racks of 500 osds each, though there is more hardware en route to start scaling them out. With just two racks, the risk of losing a ToR and taking down
the cluster was enough to justify the slight added complexity of extra ToRs to ensure we have HA at that point in the architecture. It's not adding that much complexity, as it's all handled by configuration management once you get the kinks worked out the
first time. We use this architecture throughout our networks, so running it for ceph is not any different than running it for any of our other service. I find it to be less complex and easier to debug than doing an MLAG setup as well.
We are currently running hosts with dual 10G nics, one to each ToR, but are evaluating 25 or 40 for upcoming deploys.
Once we gain confidence in ceph to expand beyond a couple thousand osds in a cluster, I will certainly look to simplify by cutting down to one higher-throughput ToR per rack.
The logical public/private separation is to keep the traffic on a separate network and for ease of monitoring.
Aaron
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com