Re: Ceph remote disaster recovery at PB scale

"huxiaoyu@xxxxxxxxxxxx" <huxiaoyu@xxxxxxxxxxxx> · Fri, 1 Apr 2022 13:02:10 +0200

Dear Arnaud,

Thanks a lot for sharing your precious experience, and this mothod for cephfs disaster recovery is really unique and intriguing! 

Curiously, how do you do cephfs metadata backup? Should the backup be done very frequently in order to avoid much data loss? need any special tool to backup cephfs metadata?

best regards,

samuel

huxiaoyu@xxxxxxxxxxxx

From: Arnaud M
Date: 2022-04-01 12:28
To: huxiaoyu@xxxxxxxxxxxx
CC: ceph-users
Subject: Re:  Ceph remote disaster recovery at PB scale
Hello 

I will speak about cephfs because it what I am working on

Of course you can do some kind of rsync or rclone between two cephfs clusters but at petabytes scales it will be really slow and cost a lot ! 

There is another approach that we tested successfully (only on test not in prod)  

We created a replicated cephfs data pool (replica 3) and spread it on 3 datacenters Beauharnois (Canada), Strasbourg (France) and Warsaw (Poland)
So we had 1 replica per datacenter

Then only the cephfs metadata pool was on ssd (nvme) close to the end user (On Strasbourg (France))

Same for Mon and Mgr (Also in Strasbourg) (in fact only cephfs data was spread geographically) 

We had high bandwidth and high latency (of course) between the datacenter but it worked surprisingly well

This way you can lose up to two datacenters without losing any data (more if you use more replicas). You just have to backup (Mon and CephFS metadata witch are never a lot of data)

This strategy is only feasible for cephfs (has it is the least IOPS demanding) 

If you need more iops then you should isolate the high iops demanding folder and run it on an separated pool locally on ssd 

All the best 

Arnaud

Leviia https://leviia.com/en

Le ven. 1 avr. 2022 à 10:57, huxiaoyu@xxxxxxxxxxxx <huxiaoyu@xxxxxxxxxxxx> a écrit :
Dear Cepher experts,

We are operating some ceph clusters (both L and N versions) at PB scale, and now planning remote distaster recovery solutions. Among these clusters, most are rbd volumes for Openstack and K8s, and a few for S3 object storage, and  very few cephfs clusters.

For rbd volumes, we are planning to use rbd mirroring, and the data volume will reach several PBs. My questions are
1) Rbd mirroring with Peta  bytes data is doable or not? are there any practical limits on the size of the total data? 
2) Should i use parallel rbd mirroring daemons to speed up the sync process? Or a single daemon would be sufficient?
3) What could be the lagging time at the remote site? at most 1 minutes or 10 minutes?

For S3 object store, we plan to use multisite replication, and thus
4) are there any practical limits on the size of the total data for S3 multisite replication?

and for CephFS data, i have no idea. 
5) what could be the best practice to CephFS disaster recovery scheme?

thanks a lot in advance for suggestions,

Samuel

huxiaoyu@xxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx