Re: Backups

Gandalf Corvotempesta <gandalf.corvotempesta@xxxxxxxxx> · Thu, 23 Mar 2017 21:29:40 +0100

Yes but the biggest issue is how to recoverYou'll need to recover the whole storage not a single snapshot and this can last for days

Il 23 mar 2017 9:24 PM, "Alvin Starr" <alvin@xxxxxxxxxx> ha scritto:

    For volume backups you need something like snapshots.
    If you take a snapshot A of a live volume L that snapshot
        stays at that moment in time and you can rsync that to another
        system or use something like deltacp.pl to copy it.
    The usual process is to delete the snapshot once its copied
        and than repeat the process again when the next backup is
        required.
    That process does require rsync/deltacp to read the complete
        volume on both systems which can take a long time.

    I was kicking around the idea to try and handle snapshot
        deltas better.
    The idea is that you could take your initial snapshot A then
        sync that snapshot to your backup system.
    At a later point you could take another snapshot B.
    Because snapshots contain the copies of the original data at
        the time of the snapshot and unmodified data points to the Live
        volume it is possible to tell what blocks of data have changed
        since the snapshot was taken.
    Now that you have a second snapshot you can in essence perform
        a diff on the A and B snapshots to get only the blocks that
        changed up to the time that B was taken.
    These blocks could be copied to the backup image and you
        should have a clone of the B snapshot.
    You would not have to read the whole volume image but just
        the changed blocks dramatically improving the speed of the
        backup.

    At this point you can delete the A snapshot and promote the B
        snapshot to be the A snapshot for the next backup round.

    On 03/23/2017 03:53 PM, Gandalf
      Corvotempesta wrote:

      Are backup consistent?
        What happens if the header on shard0 is synced
          referring to some data on shard450 and when rsync parse
          shard450 this data is changed by subsequent writes?

        Header would be backupped  of sync respect the
          rest of the image

        Il 23 mar 2017 8:48 PM, "Joe Julian"
          <joe@xxxxxxxxxxxxxxxx>
          ha scritto:

              The rsync protocol only passes blocks that have
                actually changed. Raw changes fewer bits. You're right,
                though, that it still has to check the entire file for
                those changes.

              On
                03/23/17 12:47, Gandalf Corvotempesta wrote:

                Raw or qcow doesn't change anything
                  about the backup.
                  Georep always have to sync the whole
                    file

                  Additionally, raw images has much less
                    features than qcow

                  Il 23 mar 2017 8:40 PM, "Joe
                    Julian" <joe@xxxxxxxxxxxxxxxx>
                    ha scritto:

                        I always use raw images. And yes, sharding
                          would also be good.

                        On
                          03/23/17 12:36, Gandalf Corvotempesta wrote:

                          Georep expose to another
                            problem:
                            When using gluster as
                              storage for VM, the VM file is saved as
                              qcow. Changes are inside the qcow, thus
                              rsync has to sync the whole file every
                              time

                            A little workaround would be
                              sharding, as rsync has to sync only the
                              changed shards, but I don't think this is
                              a good solution

                            Il 23 mar 2017 8:33
                              PM, "Joe Julian" <joe@xxxxxxxxxxxxxxxx>
                              ha scritto:

                                  In many cases, a full backup set is
                                    just not feasible. Georep to the
                                    same or different DC may be an
                                    option if the bandwidth can keep up
                                    with the change set. If not, maybe
                                    breaking the data up into smaller
                                    more manageable volumes where you
                                    only keep a smaller set of critical
                                    data and just back that up. Perhaps
                                    an object store (swift?) might
                                    handle fault tolerance distribution
                                    better for some workloads.
                                  There's no one right answer.

                                  On
                                    03/23/17 12:23, Gandalf
                                    Corvotempesta wrote:

                                    Backing up from
                                      inside each VM doesn't solve the
                                      problem
                                      If you have to
                                        backup 500VMs you just need more
                                        than 1 day and what if you have
                                        to restore the whole gluster
                                        storage?

                                      How many days do
                                        you need to restore 1PB?

                                      Probably the only
                                        solution should be a georep in
                                        the same datacenter/rack with a
                                        similiar cluster, 
                                      ready to became
                                        the master storage.
                                      In this case you
                                        don't need to restore anything
                                        as data are already there, 
                                      only a little bit
                                        back in time but this double the
                                        TCO

                                      Il 23 mar
                                        2017 6:39 PM, "Serkan Çoban"
                                        <cobanserkan@xxxxxxxxx>
                                        ha scritto:

                                        Assuming
                                          a backup window of 12 hours,
                                          you need to send data at
                                          25GB/s

                                          to backup solution.

                                          Using 10G Ethernet on hosts
                                          you need at least 25 host to
                                          handle 25GB/s.

                                          You can create an EC gluster
                                          cluster that can handle this
                                          rates, or

                                          you just backup valuable data
                                          from inside VMs using open
                                          source backup

                                          tools like borg,attic,restic ,
                                          etc...

                                          On Thu, Mar 23, 2017 at 7:48
                                          PM, Gandalf Corvotempesta

                                          <gandalf.corvotempesta@gmail.com>
                                          wrote:

                                          > Let's assume a 1PB
                                          storage full of VMs images
                                          with each brick over ZFS,

                                          > replica 3, sharding
                                          enabled

                                          >

                                          > How do you backup/restore
                                          that amount of data?

                                          >

                                          > Backing up daily is
                                          impossible, you'll never
                                          finish the backup that the

                                          > following one is starting
                                          (in other words, you need more
                                          than 24 hours)

                                          >

                                          > Restoring is even worse.
                                          You need more than 24 hours
                                          with the whole cluster

                                          > down

                                          >

                                          > You can't rely on ZFS
                                          snapshot due to sharding (the
                                          snapshot took from one

                                          > node is useless without
                                          all other node related at the
                                          same shard) and you

                                          > still have the same
                                          restore speed

                                          >

                                          > How do you backup this?

                                          >

                                          > Even georep isn't enough,
                                          if you have to restore the
                                          whole storage in case

                                          > of disaster

                                          >

                                          >
                                          _______________________________________________

                                          > Gluster-users mailing
                                          list

                                          > Gluster-users@xxxxxxxxxxx

                                          > http://lists.gluster.org/mailman/listinfo/gluster-users

                                    _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Alvin Starr                   ||   voice: (905)513-7688
Netvel Inc.                   ||   Cell:  (416)806-0133
alvin@xxxxxxxxxx              ||

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users