gluster ha/replication/disaster recover(dr translator) wish list

krishna at zresearch.com (Krishna Srinivas) · Sun, 18 Jan 2009 14:12:09 +0530

Keith,

We had discussion about a translator with functionality similar to
what you have described. We termed it as "backup" translator. i.e a
translator which does delayed replication. This gives a better
response for the application. We can make a lot of assumptions like
the backup directory will not be written to when primary copy is up
etc. We have not given too much thought on this as of now. Definitely
it is in our minds too.

Regards
Krishna

On Mon, Jan 12, 2009 at 7:25 PM, Keith Freedman <freedman at freeformit.com> wrote:
> I just wanted to toss out a thought I had to get it on the table.
>
> For me, the replication features (in any filesystem that supports it)
> serve several purposes
>
> 1, is to have 2 or more copies of the data which are live and useable
> (I think lustre doesn't offer this) -- this is handy for HA, and for
> performance (in my case, the servers are clients, and so they read
> data from their local disk and only have to go down to network speed
> when writing).
>
> another is for disaster recovery.
>
> What I'd like to see is a DR translator..  which is basically
> identical to AFR with a few notable exceptions:
> 1) it would be a one-way pipe--when data is updated, the updates are
> pushed over, and it's assumed that the DR location is never written
> to locally, so the auto-healing can make some assumptions and not
> have to do a 2 way comparison and data transfer
> 2) delayed writes -- I'd like to specify an allowable delay for
> updates (if this is 0, then my writes will block waiting on the data
> to be replicated), if this is higher, then gluster returns control
> back after it's written the file to the "local brick" but then
> replicates in the background.
> 3) delayed writes 2 --  if we're allowing delayed writes, then there
> may be an added benefit.  if the same file changes multiple times
> over a short period, we only have to transfer the most recent version
> of that data across the network.
>
> So, one could have a disaster recovery site with slower Internet
> connections which are in sync within a specified amount of time.  Or
> one could even use a service like Amazon S3 as a repository without
> worrying about super huge data transfer fees.
>
> I could see it used to manage a file-serving/web farm.  For example:
> I might have 7 machines which just serve images and videos.  I update
> them by pushing a new image/video to one master server, the other 6
> get updated.
>
> If someones updated a file on the DR box (i.e. the auto-heal would be
> triggered) instead of the file on the DR box being replicated back,
> it should be over-written with the version of the file on the master.
>
> This would insure data integrity and you could put your master copy
> of files on a very hardened secure server behind a firewall or DMZ,
> and if someone breaks into a box and tries to overwrite an image or
> something it would automatically get 'healed' from the master copy.
>
> Then, if there is a disaster, and you're running from your DR site,
> you simply reverse the configuration, and after the disaster let
> things auto-heal the other direction and then switch back once things
> are in sync.
>
> those are my thoughts.
> Keith
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>