Re: Multi-datacenter filesystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/13/22 10:42, Daniel Persson wrote:
Hi Team

We have grown out of our current solution, and we plan to migrate to
multiple data centers.

Will there be active Ceph users in each of the data centers? Or is it just for storage high availability and geo-redundancy and will there be one data center with Ceph users?


Our setup is a mix of radosgw data and filesystem data. But we have many
legacy systems that require a filesystem at the moment, so we will probably
run it for some of our data for at least 3-5 years.

At the moment, we have about 0.5 Petabytes of data, so it is a small
cluster. Still, we want more redundancy, so we will partner with a company
with multiple data centers within the city and have redundant fiber between
the locations.

Our current center has multiple 10GB connections, so the communication
between the new locations and our existing data center will be slower.
Still, I hope the network traffic will suffice for a multi-datacenter setup.

I assume you hope that the network traffic will _not_ suffer from a multi-dc setup. What throughput and latency would you get between data centers?


Currently, I plan to assign OSDs to different sites and racks so we can
configure a good replication rule to keep a copy of the data in each data
center.

If you want to replicate across data centers (size=3, min_size=2), so data center as failure domain, this should be achieved with the following crush rule:

ceph osd crush rule create-replicated {name} {root} datacenter [{class}]



My question is how to handle the monitor setup for good redundancy. For
example, should I set up two new monitors in each new location and have one
in our existing data center, so I get five monitors in total, or should I
keep it as three monitors, one for each data center? Or should I go for
nine monitors 3 in each data center?

5 monitors is really nice to have. You can lose one extra monitor compared to three. If you hit an issue with a monitor you have to be able to fix it with the remaining monitors that are still running. Any tuning / restarts on the remaining monitors in order to fix the problem will give downtime. More than 5 monitors should not be needed normally. But Ceph users with very large clusters might be running with more than 5, not sure about that (thinking about CERN clusters, STFC Echo).


Should I use a Stretch set up to define the location of each monitor?

Only if you want to do a dual data center setup (with 2 copies per DC). When you got more than 2 data centers, let's say 3, you don't need that. In a 3 DC setup With 5 monitors total you would obtain highest availability with two data centers with 2 monitors, and one data center with 1 monitor.


Could
you do the same for MDS:es? Do I need to configure the mounting of the
filesystem differently to signal in which data center the client is located?

If I recall correctly, Dan from CERN has MDSes placed close to the clients, and that helped to improve performance. You would need multiple active MDSes and have them balance the load between them, or do (manual) directory pinning to lock certain users to a given MDS. There is a lot of communication involved between a CephFS client and the MDS, especially for metadata operations. Higher Network latency might hurt there, so I guess it could be beneficial to optimize that. Depending on the workload, and if snapshots are used or not, there might be substantial internal MDS communication, defeating the purpose, and / or consuming traffic that could have otherwise be used for OSD / client traffic.

To come back to my first question: If only storage is distributed between data centers but not ceph users, than you might gain read performance improvements by putting all primary OSDs in the data center where the Ceph users reside. You should be able to obtain that with adjusting primary-affinity [1]. All reads (by default) should come from the primary OSDs that are all located in the proximity of the ceph users with 10 Gb/s connectivity. Writes should still be ack'ed from the OSDs in the remote data centers, so I do not expect any gains there.

So there are quite a few things you can take into consideration.

Gr. Stefan

[1]: https://docs.ceph.com/en/quincy/rados/operations/crush-map/#primary-affinity

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux