Re: Backups

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Fri, 19 Apr 2013 11:26:42 -0700



    Playing with this a big more, I think I
      see how to accomplish what I want inside Ceph.

      
      I'm only concerned about backing up and restoring the contents of
      the RADOS Gateway. 

      
      Using rados mksnap, I can snapshot the 4 gateway pools:

      .rgw

      .rgw.control

      .rgw.gc

      .rgw.buckets

      
      Is there a way to snapshot the four pools (or all pools), in a
      single command?  So far, it looks like I'll have to snap them one
      at time.

      
      To restore, the contents of the .rgw.buckets pool looks fairly
      straight forward.  I'll know the gateway object names that I need
      to restore, and rados ls -p .rgw.buckets will let me map
      the gateway object to a rados object.  I can use rados get
      to extract the contents of each object, save to a local file, then
      and manually restore to the RADOS Gateway.

      
      It would be nice if I could start up a radosgw with a --read-only
      and --snap snapshot1 argument, but a manual solution is
      acceptable at this stage.  I won't have to do this often, but I
      need to know that I can do it if necessary.

      
      Since I don't modify any existing objects, this system should work
      well on both XFS and BtrFS.  Much better than my manual filesystem
      level snapshots.

      
              Craig Lewis
              

               Senior Systems Engineer

                Office +1.714.602.1309

                Email clewis@xxxxxxxxxxxxxxxxxx
               
              Central Desktop.
                  Work together in ways you never thought possible.
                 

                   Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog  

                
      On 4/18/13 13:22 , Craig Lewis wrote:

    
      I'm new to Ceph, and considering using it to store a
          bunch of static files in the RADOS Gateway.  My files are all
          versioned, so we never modify files.  We only add new files,
          and delete unused files.
      

      I'm trying to figure out how to back everything up, to
          protect against administrative and application errors.
      

      I'm thinking about building one Ceph cluster that spans
          my primary and backup datacenters, with CRUSH rules that would
          store 2 replicas in each datacenter. I want to use BtrFS
          snapshots, like http://blog.rot13.org/2010/02/using_btrfs_snapshots_for_incremental_backup.html, but automated and with cleanup.
           I'm doing something similar now, on my NFS servers with ZFS
          and a tool called zfs-snapshot-mgmt.

        
      I read that only XFS is recommended for production
          clusters, since BtrFS itself is still beta.  Any idea how long
          until BtrFS is usable in production?

          
      I'd prefer to run Ceph on ZFS, but I see there are some
          outstanding issues in tracker.  Is anybody doing Ceph on ZFS
          in production?  ZFS itself seems to be father along than
          BtrFS.  Are there plans to make ZFS a first class supported
          filesystem for Ceph?
        

        Assuming that BtrFS and ZFS are not recommended for production, I'm thinking about XFS in the primary datacenter, and
          BtrFS + snapshots in the backup datacenter.  Once BtrFS or ZFS
          is production ready, I'd slowly migrate all partitions off
          XFS.

        
        Once the backups are made, using them is a bit tricky.

        
      In the event of an
          operator or code error, I would mount the correct BtrFS
          snapshot on all nodes in the backup datacenter, someplace like
          /var/lib/ceph.restore/.  Then I'd make a copy of ceph.conf,
          and start building a temporary cluster that runs on a
          non-standard port, made up of only the backup datacenter
          machines.  The normal cluster would stay up and running.  Once
          the temporary cluster is up, I'd manually restore the RADOS
          Gateway objects that needed to be restored.
      

      If there was ever a
          full cluster problem, like I did something stupid like rados

            rmpool metadata.  I'd shut down the whole cluster, and
          revert all of the BtrFS partitions to the last known good
          snapshot, and re-format all of the XFS partitions.  Start the
          cluster up again, and let Ceph replicate everything back to
          freshly formatted partitions.  I'd lose recent data, but it's
          better than losing all of the data.
       
      Obviously, both of
          these scenarios would need a lot of testing and many practice
          runs before they're viable.  Has anybody tried this before? 
          If not, do you see any problems with the theory?

        
          Thanks for the help.

          
      -- 

         
              Craig Lewis
              

               Senior Systems Engineer

                Office +1.714.602.1309

                Email clewis@xxxxxxxxxxxxxxxxxx
               
              Central Desktop.
                  Work together in ways you never thought possible.
                 

                   Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog  

                
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com