On 5/13/22 10:42, Daniel Persson wrote:
Hi Team
We have grown out of our current solution, and we plan to migrate to
multiple data centers.
Will there be active Ceph users in each of the data centers? Or is it
just for storage high availability and geo-redundancy and will there be
one data center with Ceph users?
Our setup is a mix of radosgw data and filesystem data. But we have many
legacy systems that require a filesystem at the moment, so we will probably
run it for some of our data for at least 3-5 years.
At the moment, we have about 0.5 Petabytes of data, so it is a small
cluster. Still, we want more redundancy, so we will partner with a company
with multiple data centers within the city and have redundant fiber between
the locations.
Our current center has multiple 10GB connections, so the communication
between the new locations and our existing data center will be slower.
Still, I hope the network traffic will suffice for a multi-datacenter setup.
I assume you hope that the network traffic will _not_ suffer from a
multi-dc setup. What throughput and latency would you get between data
centers?
Currently, I plan to assign OSDs to different sites and racks so we can
configure a good replication rule to keep a copy of the data in each data
center.
If you want to replicate across data centers (size=3, min_size=2), so
data center as failure domain, this should be achieved with the
following crush rule:
ceph osd crush rule create-replicated {name} {root} datacenter [{class}]
My question is how to handle the monitor setup for good redundancy. For
example, should I set up two new monitors in each new location and have one
in our existing data center, so I get five monitors in total, or should I
keep it as three monitors, one for each data center? Or should I go for
nine monitors 3 in each data center?
5 monitors is really nice to have. You can lose one extra monitor
compared to three. If you hit an issue with a monitor you have to be
able to fix it with the remaining monitors that are still running. Any
tuning / restarts on the remaining monitors in order to fix the problem
will give downtime. More than 5 monitors should not be needed normally.
But Ceph users with very large clusters might be running with more than
5, not sure about that (thinking about CERN clusters, STFC Echo).
Should I use a Stretch set up to define the location of each monitor?
Only if you want to do a dual data center setup (with 2 copies per DC).
When you got more than 2 data centers, let's say 3, you don't need that.
In a 3 DC setup With 5 monitors total you would obtain highest
availability with two data centers with 2 monitors, and one data center
with 1 monitor.
Could
you do the same for MDS:es? Do I need to configure the mounting of the
filesystem differently to signal in which data center the client is located?
If I recall correctly, Dan from CERN has MDSes placed close to the
clients, and that helped to improve performance. You would need multiple
active MDSes and have them balance the load between them, or do (manual)
directory pinning to lock certain users to a given MDS. There is a lot
of communication involved between a CephFS client and the MDS,
especially for metadata operations. Higher Network latency might hurt
there, so I guess it could be beneficial to optimize that. Depending on
the workload, and if snapshots are used or not, there might be
substantial internal MDS communication, defeating the purpose, and / or
consuming traffic that could have otherwise be used for OSD / client
traffic.
To come back to my first question: If only storage is distributed
between data centers but not ceph users, than you might gain read
performance improvements by putting all primary OSDs in the data center
where the Ceph users reside. You should be able to obtain that with
adjusting primary-affinity [1]. All reads (by default) should come from
the primary OSDs that are all located in the proximity of the ceph users
with 10 Gb/s connectivity. Writes should still be ack'ed from the OSDs
in the remote data centers, so I do not expect any gains there.
So there are quite a few things you can take into consideration.
Gr. Stefan
[1]:
https://docs.ceph.com/en/quincy/rados/operations/crush-map/#primary-affinity
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx