Re: HA, GlusterFS server protocol and LDirectorD

Geoff Kassel <gkassel@xxxxxxxxxxxxxxxxxxxxx> · Fri, 14 Sep 2007 03:02:42 +1000

Hi all,
   Just thought I'd reply to myself, since I've worked out something of a 
solution to my GlusterFS HA issues.

   The solution turned out to be abandoning Heartbeat and LDirectorD, and 
using Keepalived with a few heartbeat-like scripts. This gives server 
load-balancing, two-second failover and some limited recovery ability, 
without filling up log files with peer EOF messages. 

   While the two-second failover is not friendly to already running processes 
requiring continuous filesystem access (i.e. find, du, copy operations, 
databases etc), this should be reasonably adequate for use with web and mail 
servers where a few seconds downtime should not be of much issue. (At least, 
I hope so, since I'll be deploying web and mail services based around this 
configuration shortly. Wish me luck.)

   For the curious, I've attached the associated Keepalived configuration file 
and Perl scripts. The exact same Keepalived configuration should be in use 
across all hosts running GlusterFS servers. To use this with GlusterFS 
clients, just set the remote host in the GlusterFS client translator to the 
virtual server IP address you're using in keepalived.conf.

   Scripts and configuration files are released under the LGPL and the FDL, 
respectively. Enjoy!

Kind regards,

Geoff Kassel.

On Tue, 11 Sep 2007, Geoff Kassel wrote:
> Hi all,
>    I'm trying to set up LDirectorD (through Heartbeat) to load-balance and
> failover client connections to GlusterFS server instances over TCP.
>
>    First of all, I'm curious to find out if anyone else has attempted this,
> as I've had no luck with maintaining client continuity with round-robin DNS
> in /etc/hosts and client timeouts, as advised in previous posts and
> tutorials. The clients just go dead with 'Transport endpoint is not
> connected' messages.
>
>    My main problem is that LDirectorD doesn't seem to recognize that a
> GlusterFS server is functional through the connection test method, so I
> can't detect if a server goes down. While LDirectorD does a
> request-response method of liveness detection, the GlusterFS protocol is
> unfortunately too lengthy to use in the configuration files. (It needs to
> be a request that can fit on a single line, it seems.)
>
>    I'm wondering if there's a simple request-response connection test I
> haven't found yet that I can use to check for liveness of a server over
> TCP. If there isn't... could I make a feature request for such? Anything
> that can be done manually over a telnet connection to the port would be
> perfect.
>
>    Thank you for GlusterFS, and thanks in advance for your time and effort
> in answering my question.
>
> Kind regards,
>
> Geoff Kassel.
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel

! Configuration File for keepalived

# Distributed storage - GlusterFS

vrrp_instance INT_GLUSTERFSD {
    interface eth0
    state MASTER
    virtual_router_id 1
    priority 10
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass *secret*
    }
    virtual_ipaddress {
        192.168.0.3
    }
}

virtual_server 192.168.0.3 6996 {
    delay_loop 1
    lb_algo wrr
    lb_kind DR
    persistence_granularity 255.255.255.255
    protocol TCP

    real_server 127.0.0.1 6996 {
        weight 10
        notify_down "/etc/init.d/glusterfsd stop ; /etc/init.d/glusterfsd start"
        MISC_CHECK {
            misc_path "/scripts/check-glusterfsd.pl 127.0.0.1 6996 1"
        }
    }

    real_server 192.168.0.1 6996 {
        weight 1
        MISC_CHECK {
            misc_path "/scripts/check-glusterfsd.pl 192.168.0.1 6996 1"
        }
    }

    real_server 192.168.0.2 6996 {
        weight 1
        MISC_CHECK {
            misc_path "/scripts/check-glusterfsd.pl 192.168.0.2 6996 1"
        }
    }
}