gluster ha/replication/disaster recover(dr translator) wish list

prabhu at aero.iitb.ac.in (Prabhu Ramachandran) · Tue, 27 Jan 2009 00:06:19 +0530

Hi Keith,

Thanks for the detailed response.

On 01/26/09 15:31, Keith Freedman wrote:
> At 09:12 AM 1/25/2009, Prabhu Ramachandran wrote:
>> With glusterfs I used unify for the 4 partitions on each machine and
>> then afr'd the two unified disks but was told that this is not a
>> reliable way of doing things and that I'd run into problems when one
>> host goes down.
> 
> it depends what happens when a host goes down.
> if the issue is "server crashed" then you should be fine doing this with 
> gluster/HA translator.
> 
> as long as you unify bricks that are all on one server, and AFR to a 
> unify of bricks all on the other server, then if one server is down, AFR 
> will only use the unify brick of the server that is up.
> when the other server comes back up, things will auto-heal the server 
> that was down.

OK, I'm in luck then.  I have two machines say A and B each of which 
have 4 different sized partitions (different disks) that I unify.  So I 
have a unifyA and a unifyB.  I then AFR these two unified bricks.

> IF your problem is that a server is temporarily down because an unstable 
> network connection, then you have a more difficult problem.

Well, I only used the setup for a bit till Krishna told me that I was 
likely to run into problems and I stopped. So I didn't really run into 
problems in that short period.

> Here, if the network connection fails and is back up in short periods of 
> time you'll alway be experiencing delays as gluster is often waiting for 
> timeouts, then the server is visible again, it auto-heals, then it's not 
> visible and it has to timeout.
> It'll likely work just fine, but this will seem pretty slow (but no 
> moreso than an NFS mount behind a faulty connection I suppose).

Are you saying I can't use the local files on machine A (even if I am 
not touching any files on B) when the network is down even though all I 
am doing is reading and perhaps writing locally on A?  That could be a 
bit of a problem in my case since I usually keep any local builds on the 
partitions I'd like to share.  There could be other problems when one 
machine is down for maintenance for example (or the disk crashes).

> Things can be further complicated if you have some clients that can see 
> SERVER 1 and other clients that only see SERVER 2.  If this happens, 
> then you will increase the likelihood of a split brain situation and 
> things will go wrong when it tries to auto-heal (most likely requiring 
> manual intervention to get back to a stable state).

Ahh, but in my case I don't have that problem there are only two 
machines (currently at any rate).

> so the replication features of gluster/HA will most likely solve your 
> problem.
> if you have specific concerns, post a volume config to the group so 
> people can advise you on a specific configuration.

Well, I posted my configuration a while ago here:

http://gluster.org/pipermail/gluster-users/20081217/000899.html

The attachment is scrubbed which makes it a pain to get from there.  I 
enclose the relevant parts below.  Many thanks.

cheers,
prabhu

--- glusterfs-server.vol ---
# Unify four partitions (I only display two here for brevity)

# Declare the storage directories.
volume posix1
   type storage/posix
   option directory /export/sda5/export
end-volume

# The namespace storage.
volume posix-ns
   type storage/posix
   option directory /export/sdb6/export-ns
end-volume

# snip similar blocks for other partitions

# The locks for the storage to create bricks.
volume brick1
   type features/posix-locks
   option mandatory on          # enables mandatory locking on all files
   subvolumes posix1
end-volume

volume brick-ns
   type features/posix-locks
   option mandatory on          # enables mandatory locking on all files
   subvolumes posix-ns

# Now unify the bricks.
volume unify
   type cluster/unify
   option namespace brick-ns
   subvolumes brick1 brick2 brick3 brick4
   option scheduler rr
end-volume

# Serve the unified brick on the network.
volume server
   type protocol/server
   option transport-type tcp/server
   subvolumes unify
   option auth.addr.unify.allow x.y.z.w,127.0.0.1
end-volume
------- EOF ------------

------- glusterfs-client.vol ----------
# The unified servers: 1 and 2.
volume server1
   type protocol/client
   option transport-type tcp/client
   option remote-host w.x.y.z
   option remote-subvolume unify
end-volume

volume server2
   type protocol/client
   option transport-type tcp/client
   option remote-host w1.x.y.z
   option remote-subvolume unify
end-volume

# AFR the two servers.
volume afr
   type cluster/afr
   subvolumes server1 server2
end-volume
-------- EOF ------------------