Re: Use case: one-way RADOS "replication" between two clusters by time period

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



RadosGW Federation can fulfill this use case: http://ceph.com/docs/master/radosgw/federated-config/ .  Depending on your setup, it may or may not be "easily".

To start, radosgw-agent handles the replication.  It does the metadata (users and bucket) and the data (objects in a bucket).  It only flows from the primary to the secondary, so you're good there.

It tracks what's been replicated, and maintains this state in (I believe) the secondary cluster.  If replication is started up after being down, it starts from the last replication timestamp, and runs up to now (whatever is "now" when the run starts).  Objects that have been deleted and garbage collected in the primary won't replicate, but it won't cause the replication to fail.


The currently version of radosgw-agent, 1.2, attempts to get everything from it's last replication timestamp to current in a single pass.  It doesn't persist it's replication state until it finishes that pass.  Because of this, any interruption of the replication will start over.

This is really only a problem if you have large buckets.  If you have many bucket with a small amount of data, you'll just want to run a lot of replication threads.  I have a few buckets, with ~1M objects and ~1 TiB of data per bucket.  Took me a while to figure out that nightly log rotation was restarting the daemon.  Once I disable log rotation, I ran into problems with the stability of my VPN connection.


It's definitely do-able.  I would setup some virtual test clusters and try it out.



On Thu, Oct 16, 2014 at 2:05 AM, Anthony Alba <ascanio.alba7@xxxxxxxxx> wrote:
Hi list,

Can RADOS fulfil the following use case:

I wish to have a radosgw-S3 object store that is "LIVE",
this represents "current" objects of users.

Separated by an air-gap is another radosgw-S3 object store that is "ARCHIVE".

The objects will only be created and manipulated by radosgw.

Periodically, (on the order of 3-6 months), I want to connect the two
clusters and replicate all objects from LIVE to ARCHIVE created from
"time period DDMMYYYY1 - DDMMYYYY2" or better yet "from
the last timestamp" . This is a "one" way replication and the objects
are transferred only in the LIVE ==> ARCHIVE direction.

Can this be done easily?

Thanks
Anthony
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux