Re: S3 and RBD backup

Janne Johansson <icepic.dz@xxxxxxxxx> · Thu, 19 May 2022 09:45:05 +0200

Den ons 18 maj 2022 kl 22:33 skrev Sanjeev Jha <sanjeev_mac@xxxxxxxxxxx>:
>
> Thanks Janne for the information in detail.
>
> We have RHCS 4.2 non-collocated setup in one DC only. There are few RBD volumes mapped to MariaDB Database.
> Also, S3 endpoint with bucket is being used to upload objects. There is no multisite zone has been implemented yet.
> My Requirement is to take backup of RBD images and database.
> How can S3 bucket backup and restore be possible?
> We are looking for many opensource tool like rclone for S3 and Benji for RBD but not able to make sure whether these tools would be enough to achieve backup goal.
> Your suggestion based on the above case would be much appreciated.

Unfortunately, your reply mostly repeated that you want to backup RBD
and S3, and the OS version in use and one of the programs the clients
run, but almost nothing on the various dimensions I listed (and there
are probably more than those I managed to think of when I wrote it)
which would affect the actual choices you have.

When my customers state "I want backup", we need to ask if they mean
"crash recovery", "having one copy of the last 30,90,365 days" or
"archive a single copy of some data for 1-5-10-15 years for legal
reasons". Those three scenarios are all VERY different, and handled in
very different ways. You can't just take a 10-year-archive solution
and hope that it will work fine for "my live payment database just
crashed, can we be up again in 5 minutes?" cases.

So if you don't know now what you want or need, chances are very low
that any suggestion you receive will be correct. You can spend months
working on the wrong solution, trying to change it for the actual
problems you realize you have, when doing some design work beforehand
will save you lots of time.

If you look at the other answers on this thread, you see choices made
like "we do x,y,z to move certain data over to our existing backup
program because we can't afford a full replica site". There, someone
knew a replication site would be a decent option, but cost or space or
some other constraint made that option to not be preferred compared to
rigging an export translation in order to utilize the not-optimal but
existing backup framework already in use and invested in. That is what
I tried to describe in my first reply. You must figure out what is and
isn't important, what is and isn't possible in your environment in
terms of cpu,network,disk,license-costs, if clients must stop
accessing while backups go or not, if best-effort copies are good
enough or if your clients will be angry at you at restore time for
backing up an SQL db while writes were in-flight so it isn't easily
restartable or consistent.

Backups of moving data is hard, and it has become harder over the last
50+ years because those who have worked with storage and backups have
seen all the ways restores fail.

At first people would think backups are just copying a file from A to
B, often manually. Then users "forgot" to do that, so some scripts had
to run at intervals so the computer would remember to do it every X
hours. Then any file that was open would not get a decent backup for
reasons. Then servers needed backup, and they started running
databases that would never be still (if the company gets to choose)
and those were even more important to backup. Then you would get a
night time window to make backups, perhaps between 02-03.

Then data grew like it always does, so 1h was not enough. So you need
to either not-backup or stop backing at 03 or try to figure out a more
efficient way to read, transfer, store data so backups go fast again
even at larger size. This eats resources at both ends, and hence
backup servers need more capacity to handle many CPU intensive backups
at 02-03. You start doing incremental backups. That is super nice,
except now restores take lots more time. Those almost-an-hour long
backup jobs that only send new data over to the nightly backup still
need to be rebuilt into the total of all the differentials back to the
last full backup.

Restores will then not take "an hour" just because the nightly used to
take an hour. Suddenly you need faster net, faster tapes/drives, but
boss wants to spend company money on sailboat racing adverts instead
of expensive backup hardware that isn't making a profit for the
company, so whatever solution you want needs to cost nothing but still
have capacity for 90 daily backups och full data copies that restore
in an instant. All of those can't be fulfilled at the same time, so
you have to do your homework and figure out which demands are super
important and which parts are nice to have.

All can't be true at the same time.

So while I appreciate you ask for help with solutions, there is still
that part where you have to do a bit of the homework and figure out
your limitations, and not just repeat "I want to backup S3 and RBD".

> > Could someone please let me know how to take S3 and RBD backup from Ceph side and possibility to take backup from Client/user side?
> > Which tool should I use for the backup?
>
> Backing data up, or replicating it is a choice between a lot of
> variables and options, and choosing something that has the least
> negative effects for your own environment and your own demands. Some
> options will cause a lot of network traffic, others will use a lot of
> CPU somewhere, others will waste disk on the destination for
> performance reasons and some will have long and complicated restore
> procedures. Some will be realtime copies but those might put extra
> load on the cluster while running, others will be asynchronous but
> might need a database at all times to keep track of what not to copy
> because it is already at the destination. Some synchronous options
> might even cause writes to be slower in order to guarantee that ALL
> copies are in place before sending clients an ACK, some will not and
> those might lose data that the client thought was delivered 100% ok.
>
> Without knowing what your demands are, or knowing what situation and
> environment you are in, it will be almost impossible to match the
> above into something that is good for you.
> Some might have a monetary cost, some may require a complete second
> cluster of equal size, some might have a cost in terms of setup work
> from clueful ceph admins that will take a certain amount of time and
> effort. Some options might require clients to change how they write
> data into the cluster in order to help the backup/replication system.
>
> There is unfortunately not a single best choice for all clusters,
> there might even not exist a good option just to cover both S3 and RBD
> since they are inherently very different.
> RBD will almost certainly be only full restores of a large complete
> image, S3 users might want to have the object
> foo/bar/MyImportantWriting.doc from last wednesday back only and not
> revert the whole bucket or the whole S3 setup.
>
> I'm quite certain that there will not be a single
> cheap,fast,efficient,scalable,unnoticeable,easy solution that solves
> all these problems at once, but rather you will have to focus on what
> the toughest limitations are (money, time, disk, rackspace, network
> capacity, client and IO demands?) and look for solutions (or products)
> that work well with those restrictions.
>
> --
> May the most significant bit of your life be positive.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx