Re: Backup strategies for rgw s3

Tim Holloway <timh@xxxxxxxxxxxxx> · Wed, 25 Sep 2024 12:43:30 -0400

Well, using Ceph as its own backup system has its merits, and I've
little doubt something could be cooked up, but another alternative
would be to use a true backup system.

In my particular case, I use the Bacula backup system product. It's not
the most polished thing around, but it is a full-featured
backup/restore solution including aging archives, compressed backups
and even stuff like automated tape library management. I do incremental
daily backups and weekly full backups.

Bacula works by linking clients that can read the filesystem (or,
buckets in your case) and backend storage units, which can be physical
devices or disk directories. In my case, I backup my ceph filesystem in
a directory, size-limited so that any given backup volume is size-
limited to fit on a DVD if I wanted non-magnetic long-term store.

The backup volume file format is analogous to a tarball in that it
contains directory and attribute metadata making for a faithful backup
and restore. There are offline utilities that can be used to restore if
the master backup directory is unavailable.

The Bacula solution for backing up from S3 is a plugin for the
Enterprise Edition product. What it actually does is download the
bucket data from the S3 server to a local spool file, download the S3
metadata directly, then transmit them to the linked storage director
via the standard mechanisms.

   Tim

On Wed, 2024-09-25 at 16:57 +0200, Adam Prycki wrote:
> Hi,
> 
> I'm currently working on a project which requires us to backup 2 
> separate s3 zones/realms and retain it for few months. Requirements
> were 
> written by someone who doesn't know ceph rgw capabilities.
> We have to do incremental and full backups. Each type of backup has 
> separate retention period.
> 
> Is there a way to accomplish this with in a sensible way?
> 
> My fist idea would be to create multisite replication to archive-
> zone. 
> But I cannot really enforce data retention on archive zone. It would 
> require us to overwrite lifecycle policies created by our users.
> As far as I know it's not possible to create zone level lifecycle 
> policy. Users get their accounts are provisioned via openstack swift.
> 
> Second idea would be to create custom backup script and copy all the 
> buckets in the cluster to different s3 zone. Destination buckets
> could 
> be all versioned to have desired retention. But this option feels
> very 
> hackish and messy. Backing up 2 separate s3 zones to one could cause 
> collision in bucket names. Prefixing bucket names with additional 
> information is not safe because buckets have fixed name length. 
> Prefixing object key name is also not ideal.
> 
> Best regards
> Adam Prycki
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx