Re: [Gluster-users] AFR arbiter volumes

David Gossage <dgossage@xxxxxxxxxxxxxxxxxx> · Wed, 9 Sep 2015 10:57:44 -0500

Once the volume is created as an Arbiter volume can it at a later time be changed to a replica 3 with all bricks containing data?

David Gossage

Carousel Checks Inc. | System Administrator

Office 708.613.2284

On Tue, Sep 8, 2015 at 8:46 PM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

    Sending out this mail for awareness/ feedback.

-----------------------------------------------------------------------------

      What:

      Since glusterfs-3.7,  AFR supports creation of arbiter
      volumes. These are a special type of replica 3 gluster volume
      where the 3rd brick  is (always) configured as an arbiter
      node.What this means is that the 3rd brick will store only the
      file name and metadata (including gluster xattrs), but does not
      contain any data. Arbiter volumes prevent split-brains and
      consumes lesser space than a normal replica 3 volume and provides
      better consistency and availability than a replica 2 volume.

      How:

      You can create an arbiter volume with the following command:

        gluster volume create <VOLNAME> replica 3 arbiter 1
        host1:brick1 host2:brick2 host3:brick3

      Note that the syntax is similar to creating a normal replica 3
      volume with the exception of the arbiter 1 keyword. As
      seen in the command above, the only permissible values for the
      replica count and arbiter count are 3 and 1 respectively. Also,
      the 3rd brick is always chosen as the arbiter brick and it is
      currently not configurable to have any other brick as the arbiter.

      Client/ Mount behaviour:

      By default, client quorum (cluster.quorum-type) is set to auto
      for a replica 3 volume (including arbiter volumes) when it is
      created; i.e. at least 2 bricks need to be up to satisfy quorum
      and to allow writes. This setting is not to be changed for arbiter
      volumes also. Additionally, the arbiter volume has some additional
      checks to prevent files from ending up in split-brain:

          * Clients take full file locks when writing to a file as
      opposed to range locks in a normal replica 3 volume.

          * If 2 bricks are up and if one of them is the arbiter (i.e.
      the 3rd brick) and it blames the other up brick, then all FOPS
      will fail with ENOTCONN (Transport endpoint is not connected). If
      the arbiter doesn't blame the other brick, FOPS will be allowed to
      proceed. 'Blaming' here is w.r.t the values of AFR changelog
      extended attributes.

          * If 2 bricks are up and the arbiter is down, then FOPS will
      be allowed.

          * In all cases, if there is only one source before the FOP is
      initiated and if the FOP fails on that source, the application
      will receive ENOTCONN.

      Note: It is possible to see if a replica 3 volume has arbiter
      configuration from the mount point. If
        $mount_point/.meta/graphs/active/$V0-replicate-0/options/arbiter-count
      exists and its value is 1, then it is an arbiter volume. Also the
      client volume graph will have arbiter-count as a xlator option for
      AFR translators.

      Self-heal daemon behaviour:

      Since the arbiter brick does not store any data for the files, it
      cannot be used as a source for data self-heal. For example if
      there are 2 source bricks B2 and B3 (B3 being arbiter brick) and
      B2 is down, then data-self-heal will not happen from B3 to sink
      brick B1, and will be pending until B2 comes up and heal can
      happen from it. Note that metadata and entry self-heals can still
      happen from B3 if it is one of the sources. 

    -----------------------------------------------------------------------------

      Please provide feedback if you have tried it out.

      If you ever encounter a split-brain while using the arbiter
        volume, it is a BUG - do report!

      We have had users asking for a way to convert existing replica 2
      volumes to arbiter volumes- this is definitely in our to-do list,
      in addition to some performance optimizations.

      Thanks,

      Ravi

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel