Re: Quorum in distributed-replicate volume

Karthik Subrahmanya <ksubrahm@xxxxxxxxxx> · Tue, 27 Feb 2018 15:20:25 +0530

On Tue, Feb 27, 2018 at 1:40 PM, Dave Sherohman <dave@xxxxxxxxxxxxx> wrote:
On Tue, Feb 27, 2018 at 12:00:29PM +0530, Karthik Subrahmanya wrote:

> I will try to explain how you can end up in split-brain even with cluster

> wide quorum:

Yep, the explanation made sense.  I hadn't considered the possibility of

alternating outages.  Thanks!

> > > It would be great if you can consider configuring an arbiter or

> > > replica 3 volume.

> >

> > I can.  My bricks are 2x850G and 4x11T, so I can repurpose the small

> > bricks as arbiters with minimal effect on capacity.  What would be the

> > sequence of commands needed to:

> >

> > 1) Move all data off of bricks 1 & 2

> > 2) Remove that replica from the cluster

> > 3) Re-add those two bricks as arbiters

> >

> > (And did I miss any additional steps?)

> >

> > Unfortunately, I've been running a few months already with the current

> > configuration and there are several virtual machines running off the

> > existing volume, so I'll need to reconfigure it online if possible.

> >

> Without knowing the volume configuration it is difficult to suggest the

> configuration change,

> and since it is a live system you may end up in data unavailability or data

> loss.

> Can you give the output of "gluster volume info <volname>"

> and which brick is of what size.

Volume Name: palantir

Type: Distributed-Replicate

Volume ID: 48379a50-3210-41b4-9a77-ae143c8bcac0

Status: Started

Snapshot Count: 0

Number of Bricks: 3 x 2 = 6

Transport-type: tcp

Bricks:

Brick1: saruman:/var/local/brick0/data

Brick2: gandalf:/var/local/brick0/data

Brick3: azathoth:/var/local/brick0/data

Brick4: yog-sothoth:/var/local/brick0/data

Brick5: cthulhu:/var/local/brick0/data

Brick6: mordiggian:/var/local/brick0/data

Options Reconfigured:

features.scrub: Inactive

features.bitrot: off

transport.address-family: inet

performance.readdir-ahead: on

nfs.disable: on

network.ping-timeout: 1013

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

cluster.eager-lock: enable

network.remote-dio: enable

cluster.quorum-type: auto

cluster.server-quorum-type: server

features.shard: on

cluster.data-self-heal-algorithm: full

storage.owner-uid: 64055

storage.owner-gid: 64055

For brick sizes, saruman/gandalf have

$ df -h /var/local/brick0

Filesystem                   Size  Used Avail Use% Mounted on

/dev/mapper/gandalf-gluster  885G   55G  786G   7% /var/local/brick0

and the other four have

$ df -h /var/local/brick0

Filesystem      Size  Used Avail Use% Mounted on

/dev/sdb1        11T  254G   11T   3% /var/local/brick0

If you want to use the first two bricks as arbiter, then you need to be aware of the following things:
- Your distribution count will be decreased to 2.
- Your data on the first subvol i.e., replica subvol - 1 will be unavailable till it is copied to the other subvols
after removing the bricks from the cluster.

Since arbiter bricks need not be of same size as the data bricks, if you can configure three more arbiter bricks
based on the guidelines in the doc [1], you can do it live and you will have the distribution count also unchanged.

One more thing from the volume info; Only the options which are reconfigured will appear in the volume info output.
The quorum-type is in the list which says it is manually reconfigured.

[1] http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#arbiter-bricks-sizing

Regards,
Karthik

--

Dave Sherohman

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users