Stateless Nodes - HowTo - was Re: glusterfs-3.3.0qa34 released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,


  Well I can't believe that it's been more than a year since I started looking into a stateless cluster implementation for GlusterFS .. time flies eh.

  First - what do I mean by "stateless"?  I mean that;
    - the user configuration of the operating environment is maintained outside of the OS
    - the operating system is destroyed on reboot or power off, and all OS and application configuration is irrecoverably lost
    - on each boot we want to get back to the user preferred/configured operating environment through the most normal methods possible (preferably by running the same commands and JIT building of config files that were used to configure the system the first time; should be used every time).

  In this way, you could well argue that the OE state is maintained in a type of provisioning or orchestration tool, outside of the OS and application instances (or in my case in the Saturn configuration file that is the only persistent data maintained between running OE instances).

  Per the thread below, to get a stateless node (no clustering involved) we would remove the xattr values from each shared brick, on boot;
    removexattr(mount_point, "trusted.glusterfs.volume-id")
    removexattr(mount_point, "trusted.gfid")

  And then we would populate glusterd/glusterd.info with an externally stored UUID (to make it consistent across boots).  These three actions would allow the CLI "gluster volume create" commands to run unimpeded - thanks to Amar for that detail.

  Note1: that we've only been experimenting with DHT/Distribute, so I don't know if other Gluster xlator modules have pedantic needs in addition to the above.
  Note2: that my glusterd directory is in /etc (/etc/glusterd/glusterd.info), where-as the current location in the popular distro's is, I believe, /var/lib (/var/lib/glusterd/glusterd.info), so I will refer to the relative path in this message.


  But we have finally scaled out beyond the limits of our largest chassis (down to 1TB free) and need to cluster to add on more capacity via the next chassis. Over the past three nights I've had a chance to experiment with GlusterFS 3.3.0 (I will be looking at 3.4.0 shortly) and create a "distribute" volume between two clustered nodes.  To get a stateless outcome we then need to be able to boot one node from scratch and have it re-join the cluster and volume from only the "gluster" CLI command/s.

  For what its worth, I couldn't find a way to do this.  The peer probing model doesn't seem to allow an old node to rejoin the cluster.

  So many thanks to Mike of FunWithLinux for this post and steering me in the right direction;
    http://funwithlinux.net/2013/02/glusterfs-tips-and-tricks-centos/

  The trick seems to be (in addition to the non-cluster configs, above) to manage the cluster membership outside of GlusterFS.  On boot, we automatically populate the relevant peer file
(glusterd/peers/{uuid}) with the UUID, state=3, and hostname/IP address; one file for each other node in the cluster (excluding the local node).  I.e.

    # cat /etc/glusterd/peers/ab2d5444-5a01-427a-a322-c16592676d29
      uuid=ab2d5444-5a01-427a-a322-c16592676d29
      state=3
      hostname1=192.168.179.102

  Note that if you're using IP addresses as your node handle (as opposed to host names) then you must retain the same IP address across boots for this to work, lest you make modifications to the existing/running cluster nodes that will require glusterd to be restarted.

  When you do this in the startup process you can skip the "gluster peer probe" and simply call the "gluster volume create" as we did in a non clustered environment, on every node as it boots (on every boot, including the first).  The nodes that are late to the party will be told that the configuration already exists, and the clustered volume *should* come up.

  I am still experimenting, but I say "should" because you can sometimes see a delay in the re-establishment of the clustered volume, and you can sometimes see the clustered volume fail to re-establish. When it fails to re-establish the solution seems to be a "gluster volume start" for that volume, on any node.  FWIW I believe I'm seeing this locally because Saturn tries to nicely stop all Gluster volumes on reboot, which is affecting the cluster (of course) - lol - a little more integration work to do.
  

  The external state needed then looks like this on the first node (101);

    set gluster server        uuid 6b481ebb-859a-4c2b-8b5f-8f0bba7c3b9a
    set gluster peer0         uuid ab2d5444-5a01-427a-a322-c16592676d29
    set gluster peer0         ipv4_address 192.168.179.102
    set gluster volume0       name myvolume
    set gluster volume0       is_enabled 1
    set gluster volume0       uuid 00000000-0000-0000-0000-000000000000
    set gluster volume0       interface eth0
    set gluster volume0       type distribute
    set gluster volume0       brick0 /dev/hda
    set gluster volume0       brick1 192.168.179.102:/glusterfs/exports/hda

  And the external state needed looks like this on the second node (102);

    set gluster server        uuid ab2d5444-5a01-427a-a322-c16592676d29
    set gluster peer0         uuid 6b481ebb-859a-4c2b-8b5f-8f0bba7c3b9a
    set gluster peer0         ipv4_address 192.168.179.101
    set gluster volume0       name myvolume
    set gluster volume0       is_enabled 1
    set gluster volume0       uuid 00000000-0000-0000-0000-000000000000
    set gluster volume0       interface eth0
    set gluster volume0       type distribute
    set gluster volume0       brick0 192.168.179.101:/glusterfs/exports/hda
    set gluster volume0       brick1 /dev/hda

  Note that I assumed that there was a per volume UUID (currently all zeros) that I would need to re-instate but haven't seen yet (presumably it's one value that's currently being removed from the mount point xattr's on each boot).


  I hope that this information helps others who are trying to dynamically provision and re-provision virtual/infrastructure environments.  I note that this information covers a topic that has not been written up on the Gluster site;

     HowTo - GlusterDocumentation
     http://www.gluster.org/community/documentation/index.php/HowTo
     [...]
     Articles that need to be written
     Troubleshooting
       - UUID's and cloning Gluster instances
       - Verifying cluster integrity
     [...]


  Please feel free to use this content to help contribute to that FAQ/HowTo document.


Cheers,


----- Original Message -----
>From: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
>To: "Amar Tumballi" <amarts@xxxxxxxxxx>
>Subject:  Re: glusterfs-3.3.0qa34 released
>Date: Wed, 18 Apr 2012 18:55:46 +1000
>
> 
> ----- Original Message -----
> >From: "Amar Tumballi" <amarts@xxxxxxxxxx>
> >To: "Ian Latter" <ian.latter@xxxxxxxxxxxxxxxx>
> >Subject:  Re: glusterfs-3.3.0qa34 released
> >Date: Wed, 18 Apr 2012 13:42:45 +0530
> >
> > On 04/18/2012 12:26 PM, Ian Latter wrote:
> > > Hello,
> > >
> > >
> > >    I've written a work around for this issue (in 3.3.0qa35)
> > > by adding a new configuration option to glusterd
> > > (ignore-strict-checks) but there are additional checks
> > > within the posix brick/xlator.  I can see that volume starts
> > > but the bricks inside it fail shortly there-after, and
> that of
> > > the 5 disks in my volume three of them have one
> > > volume_id and two them have another - so this isn't going
> > > to be resolved without some human intervention.
> > >
> > >    However, while going through the posix brick/xlator I
> > > found the "volume-id" parameter.  I've tracked it back
> > > to the volinfo structure in the glusterd xlator.
> > >
> > >    So before I try to code up a posix inheritance for my
> > > glusterd work around (ignoring additional checks so
> > > that a new volume_id is created on-the-fly / as-needed),
> > > does anyone know of a CLI method for passing the
> > > volume-id into glusterd (either via "volume create" or
> > > "volume set")?  I don't see one from the code ...
> > > glusterd_handle_create_volume does a uuid_generate
> > > and its not a feature of glusterd_volopt_map ...
> > >
> > >    Is a user defined UUID init method planned for the CLI
> > > before 3.3.0 is released?  Is there a reason that this
> > > shouldn't be permitted from the CLI "volume create" ?
> > >
> > >
> > We don't want to bring in this option to CLI. That is
> because we don't 
> > think it is right to confuse USER with more
> options/values. 'volume-id' 
> > is a internal thing for the user, and we don't want him to
> know about in 
> > normal use cases.
> > 
> > In case of 'power-users' like you, If you know what you
> are doing, the 
> > better solution is to do 'setxattr -x trusted.volume-id
> $brick' before 
> > starting the brick, so posix translator anyway doesn't get
> bothered.
> > 
> > Regards,
> > Amar
> > 
> 
> 
> Hello Amar,
> 
>   I wouldn't go so far as to say that I know what I'm
> doing, but I'll take the compliment ;-)
> 
>   Thanks for the advice.  I'm going to assume that I'll 
> be revisiting this issue when we can get back into 
> clustering (replicating distributed volumes).  I.e. I'm
> assuming that on this path we'll end up driving out 
> issues like split brain;
>  
> https://github.com/jdarcy/glusterfs/commit/8a45a0e480f7e8c6ea1195f77ce3810d4817dc37
> 
> 
> Cheers,
> 
> 
> 
> --
> Ian Latter
> Late night coder ..
> http://midnightcode.org/
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 


--
Ian Latter
Late night coder ..
http://midnightcode.org/



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux