The problem is not how to backup, but how to restore. How do you restore a whole cluster made of thousands of VMs ? If you move all VMs to a shared storage like gluster, you should consider how to recover everything from the gluster failure. If you had a bounch of VMs on each server with local disks, you had to recover only VMs affected by a single server failure, but moving everything to a shared storage means to be prepared for a disaster, where you *must* restore everything or hundreds of TB. 2017-03-23 23:07 GMT+01:00 Gambit15 <dougti+gluster@xxxxxxxxx>: > Don't snapshot the entire gluster volume, keep a rolling routine for > snapshotting the individual VMs & rsync those. > As already mentioned, you need to "itemize" the backups - trying to manage > backups for the whole volume as a single unit is just crazy! > > Also, for long term backups, maintaining just the core data of each VM is > far more manageable. > > I settled on oVirt for our platform, and do the following... > > A cronjob regularly snapshots & clones each VM, whose image is then rsynced > to our backup storage; > The backup server snapshots the VM's image backup volume to maintain > history/versioning; > These full images are only maintained for 30 days, for DR purposes; > A separate routine rsyncs the VM's core data to its own data backup volume, > which is snapshotted & maintained for 10 years; > > This could be made more efficient by using guestfish to extract the core > data from backup image, instead of basically rsyncing the data across the > network twice. > > That active storage layer uses Gluster on top of XFS & LVM. The backup > storage layer uses a mirrored storage unit running ZFS on FreeNAS. > This of course doesn't allow for HA in the case of the entire cloud failing. > For that we'd use geo-rep & a big fat pipe. > > D > > On 23 March 2017 at 16:29, Gandalf Corvotempesta > <gandalf.corvotempesta@xxxxxxxxx> wrote: >> >> Yes but the biggest issue is how to recover >> You'll need to recover the whole storage not a single snapshot and this >> can last for days >> >> Il 23 mar 2017 9:24 PM, "Alvin Starr" <alvin@xxxxxxxxxx> ha scritto: >>> >>> For volume backups you need something like snapshots. >>> >>> If you take a snapshot A of a live volume L that snapshot stays at that >>> moment in time and you can rsync that to another system or use something >>> like deltacp.pl to copy it. >>> >>> The usual process is to delete the snapshot once its copied and than >>> repeat the process again when the next backup is required. >>> >>> That process does require rsync/deltacp to read the complete volume on >>> both systems which can take a long time. >>> >>> I was kicking around the idea to try and handle snapshot deltas better. >>> >>> The idea is that you could take your initial snapshot A then sync that >>> snapshot to your backup system. >>> >>> At a later point you could take another snapshot B. >>> >>> Because snapshots contain the copies of the original data at the time of >>> the snapshot and unmodified data points to the Live volume it is possible to >>> tell what blocks of data have changed since the snapshot was taken. >>> >>> Now that you have a second snapshot you can in essence perform a diff on >>> the A and B snapshots to get only the blocks that changed up to the time >>> that B was taken. >>> >>> These blocks could be copied to the backup image and you should have a >>> clone of the B snapshot. >>> >>> You would not have to read the whole volume image but just the changed >>> blocks dramatically improving the speed of the backup. >>> >>> At this point you can delete the A snapshot and promote the B snapshot to >>> be the A snapshot for the next backup round. >>> >>> >>> On 03/23/2017 03:53 PM, Gandalf Corvotempesta wrote: >>> >>> Are backup consistent? >>> What happens if the header on shard0 is synced referring to some data on >>> shard450 and when rsync parse shard450 this data is changed by subsequent >>> writes? >>> >>> Header would be backupped of sync respect the rest of the image >>> >>> Il 23 mar 2017 8:48 PM, "Joe Julian" <joe@xxxxxxxxxxxxxxxx> ha scritto: >>>> >>>> The rsync protocol only passes blocks that have actually changed. Raw >>>> changes fewer bits. You're right, though, that it still has to check the >>>> entire file for those changes. >>>> >>>> >>>> On 03/23/17 12:47, Gandalf Corvotempesta wrote: >>>> >>>> Raw or qcow doesn't change anything about the backup. >>>> Georep always have to sync the whole file >>>> >>>> Additionally, raw images has much less features than qcow >>>> >>>> Il 23 mar 2017 8:40 PM, "Joe Julian" <joe@xxxxxxxxxxxxxxxx> ha scritto: >>>>> >>>>> I always use raw images. And yes, sharding would also be good. >>>>> >>>>> >>>>> On 03/23/17 12:36, Gandalf Corvotempesta wrote: >>>>> >>>>> Georep expose to another problem: >>>>> When using gluster as storage for VM, the VM file is saved as qcow. >>>>> Changes are inside the qcow, thus rsync has to sync the whole file every >>>>> time >>>>> >>>>> A little workaround would be sharding, as rsync has to sync only the >>>>> changed shards, but I don't think this is a good solution >>>>> >>>>> Il 23 mar 2017 8:33 PM, "Joe Julian" <joe@xxxxxxxxxxxxxxxx> ha scritto: >>>>>> >>>>>> In many cases, a full backup set is just not feasible. Georep to the >>>>>> same or different DC may be an option if the bandwidth can keep up with the >>>>>> change set. If not, maybe breaking the data up into smaller more manageable >>>>>> volumes where you only keep a smaller set of critical data and just back >>>>>> that up. Perhaps an object store (swift?) might handle fault tolerance >>>>>> distribution better for some workloads. >>>>>> >>>>>> There's no one right answer. >>>>>> >>>>>> >>>>>> On 03/23/17 12:23, Gandalf Corvotempesta wrote: >>>>>> >>>>>> Backing up from inside each VM doesn't solve the problem >>>>>> If you have to backup 500VMs you just need more than 1 day and what if >>>>>> you have to restore the whole gluster storage? >>>>>> >>>>>> How many days do you need to restore 1PB? >>>>>> >>>>>> Probably the only solution should be a georep in the same >>>>>> datacenter/rack with a similiar cluster, >>>>>> ready to became the master storage. >>>>>> In this case you don't need to restore anything as data are already >>>>>> there, >>>>>> only a little bit back in time but this double the TCO >>>>>> >>>>>> Il 23 mar 2017 6:39 PM, "Serkan Çoban" <cobanserkan@xxxxxxxxx> ha >>>>>> scritto: >>>>>>> >>>>>>> Assuming a backup window of 12 hours, you need to send data at 25GB/s >>>>>>> to backup solution. >>>>>>> Using 10G Ethernet on hosts you need at least 25 host to handle >>>>>>> 25GB/s. >>>>>>> You can create an EC gluster cluster that can handle this rates, or >>>>>>> you just backup valuable data from inside VMs using open source >>>>>>> backup >>>>>>> tools like borg,attic,restic , etc... >>>>>>> >>>>>>> On Thu, Mar 23, 2017 at 7:48 PM, Gandalf Corvotempesta >>>>>>> <gandalf.corvotempesta@xxxxxxxxx> wrote: >>>>>>> > Let's assume a 1PB storage full of VMs images with each brick over >>>>>>> > ZFS, >>>>>>> > replica 3, sharding enabled >>>>>>> > >>>>>>> > How do you backup/restore that amount of data? >>>>>>> > >>>>>>> > Backing up daily is impossible, you'll never finish the backup that >>>>>>> > the >>>>>>> > following one is starting (in other words, you need more than 24 >>>>>>> > hours) >>>>>>> > >>>>>>> > Restoring is even worse. You need more than 24 hours with the whole >>>>>>> > cluster >>>>>>> > down >>>>>>> > >>>>>>> > You can't rely on ZFS snapshot due to sharding (the snapshot took >>>>>>> > from one >>>>>>> > node is useless without all other node related at the same shard) >>>>>>> > and you >>>>>>> > still have the same restore speed >>>>>>> > >>>>>>> > How do you backup this? >>>>>>> > >>>>>>> > Even georep isn't enough, if you have to restore the whole storage >>>>>>> > in case >>>>>>> > of disaster >>>>>>> > >>>>>>> > _______________________________________________ >>>>>>> > Gluster-users mailing list >>>>>>> > Gluster-users@xxxxxxxxxxx >>>>>>> > http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users@xxxxxxxxxxx >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> _______________________________________________ Gluster-users mailing >>>>>> list Gluster-users@xxxxxxxxxxx >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@xxxxxxxxxxx >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -- >>> Alvin Starr || voice: (905)513-7688 >>> Netvel Inc. || Cell: (416)806-0133 >>> alvin@xxxxxxxxxx || >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@xxxxxxxxxxx >>> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users