Re: Federated gateways (our planning use case)

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Mon, 6 Oct 2014 17:33:05 -0700

This sounds doable, with a few caveats.

Currently, replication is only one direction.  You can only write to the primary zone, and you can read from the primary or secondary zones.  A cluster can have many zones on it.

I'm thinking your setup would be a star topology.  Each telescope will be a primary zone, and replicate to a secondary zone in the main storage cluster.  The main cluster will have one read-only secondary zone for each telescope.  If you have other needs to write data to the main cluster, you can create another zone that only exists on the main cluster (possibly replicated to one of the telescopes with a good network connection).

Each zone has it's own URL (primary and secondary), so you'd have a bit of management problem remembering to use the correct URL. The URLs can be whatever.  Convention follow's Amazon's naming scheme, but you'd probably want to create your own scheme, something like http://telescope_name-site.inasan.ru/ and http://telescope_name-campus.inasan.ru/

You might have some problems with the replication if your VPN connections aren't stable.  The replication agent isn't very tolerant of cluster problem, so I suspect (but haven't tested) that long VPN outages will need a replication agent restart.  For sites that don't have permanent connections, just make the replication agent startup and shutdown part of the connection startup and shutdown process.  Replication state is available via a REST api, so it can be monitored.

I have tested large backlogs in replication.  When I initially imported my data, I deliberately imported faster than I had bandwidth to replicate.  At one point, my secondary cluster was ~10 million objects, and ~10TB behind the primary cluster.  It eventually caught up, but the process doesn't handle stops and restarts well.  Restarting the replication while it was dealing with the backlog will start from the beginning of the backlog.  This can be a problem if your backlog is so large that it won't finish in a day, because log rotation will restart the replication agent.  If that's something you think might be a problem, I have some strategies to deal with it, but they're manual and hacky.

Does that sound feasible?

On Mon, Oct 6, 2014 at 5:42 AM, Pavel V. Kaygorodov <pasha@xxxxxxxxx> wrote:
Hi!

Our institute now planning to deploy a set of robotic telescopes across a country.

Most of the telescopes will have low bandwidth and high latency, or even not permanent internet connectivity.

I think, we can set up synchronization of observational data with ceph, using federated gateways:

1. The main big storage ceph cluster will be set up in our institute main building

2. The small ceph clusters will be set up near each telescope, to store only the data from local telescope

3. VPN tunnels will be set up from each telescope site to our institute

4. Federated gateways mechanism will do all the magic to synchronize data

Is this a realistic plan?

What problems we can meet with this setup?

Thanks in advance,

  Pavel.

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com