Behaviour of two node degraded cluster

joe at julianfamily.org (Joe Julian) · Sat, 20 Jul 2013 10:21:37 -0700

On 07/20/2013 12:36 AM, Allan Latham wrote:
> Software is 3.4.0beta4
> Cluster is Proxmax with 2 nodes + quorum disc.
> Gluster is set to replicate mode - 2 replicas.
>
> The intended use is that we require the data on gluster volumes to be
> available when the cluster is degraded - i.e. running on a single node
> (+ quorum disc).
>
> 1. when one node dies the volume is half-umounted on the surviving node.
> i.e. it still shows with the mount command but we get the error
> 'transport endpoint disconnected'.
>
> 2. it is impossible to mount the volume again although a local copy of
> all the data is available in the bricks. umount reports no error and
> mount then correctly shows the gluster mount is not there. A subsequent
> mount command of the gluster volume waits a long time and then reports
> (via the logs) that the other server is dead.
>
> The reason why this is unworkable is that it makes a virtual server
> which uses a gluster volume depend on BOTH nodes being online. This is
> the exact opposite of high-availablity.
>
> What have I configured wrong?
>
> I can partly understand the logic of this behaviour - you cannot
> possibly replicate to 2 nodes if only a single node is available.
> However to deny even read access to the available data cannot be right.
>
> What I really wanted was that 'writes' are queued and written later when
> the dead node is available again (i.e. the same behaviour as gfs2 and
> unison).
>
> Any help or clarification would be appreciated.
>
> My question in it's simplest form is:
>
> Is this the intended behaviour in these circumstances?
> Is it possible to configure for the behaviour I expected?
> If so, how do I do that?
>
Setting quorum on a 2 brick replica 2 is going to prevent writes if you 
have less than quorum. In automatic quorum mode, that's replicas/2+1 (or 
2 in this case). So nothing's going to be "queued" for writing, but 
rather denied.

Check "gluster volume status" and make sure both your servers are 
running. It sounds like your local client is not connecting to your 
local bricks.

Expected behavior is that if you "pull the plug" on one of the servers, 
the client should pause ping-timeout seconds (defaults to 42) and 
continue operating as normal. If you shut down the server, tcp 
connections are closed properly and there is no hang.

For more analysis, provide a clean client log (truncate the log, mount 
the volume, cause your failure, send log) and the result of "gluster 
volume status" during your failure.