Hi all, Just thought I'd reply to myself, since I've worked out something of a solution to my GlusterFS HA issues. The solution turned out to be abandoning Heartbeat and LDirectorD, and using Keepalived with a few heartbeat-like scripts. This gives server load-balancing, two-second failover and some limited recovery ability, without filling up log files with peer EOF messages. While the two-second failover is not friendly to already running processes requiring continuous filesystem access (i.e. find, du, copy operations, databases etc), this should be reasonably adequate for use with web and mail servers where a few seconds downtime should not be of much issue. (At least, I hope so, since I'll be deploying web and mail services based around this configuration shortly. Wish me luck.) For the curious, I've attached the associated Keepalived configuration file and Perl scripts. The exact same Keepalived configuration should be in use across all hosts running GlusterFS servers. To use this with GlusterFS clients, just set the remote host in the GlusterFS client translator to the virtual server IP address you're using in keepalived.conf. Scripts and configuration files are released under the LGPL and the FDL, respectively. Enjoy! Kind regards, Geoff Kassel. On Tue, 11 Sep 2007, Geoff Kassel wrote: > Hi all, > I'm trying to set up LDirectorD (through Heartbeat) to load-balance and > failover client connections to GlusterFS server instances over TCP. > > First of all, I'm curious to find out if anyone else has attempted this, > as I've had no luck with maintaining client continuity with round-robin DNS > in /etc/hosts and client timeouts, as advised in previous posts and > tutorials. The clients just go dead with 'Transport endpoint is not > connected' messages. > > My main problem is that LDirectorD doesn't seem to recognize that a > GlusterFS server is functional through the connection test method, so I > can't detect if a server goes down. While LDirectorD does a > request-response method of liveness detection, the GlusterFS protocol is > unfortunately too lengthy to use in the configuration files. (It needs to > be a request that can fit on a single line, it seems.) > > I'm wondering if there's a simple request-response connection test I > haven't found yet that I can use to check for liveness of a server over > TCP. If there isn't... could I make a feature request for such? Anything > that can be done manually over a telnet connection to the port would be > perfect. > > Thank you for GlusterFS, and thanks in advance for your time and effort > in answering my question. > > Kind regards, > > Geoff Kassel. > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel
! Configuration File for keepalived # Distributed storage - GlusterFS vrrp_instance INT_GLUSTERFSD { interface eth0 state MASTER virtual_router_id 1 priority 10 advert_int 1 authentication { auth_type PASS auth_pass *secret* } virtual_ipaddress { 192.168.0.3 } } virtual_server 192.168.0.3 6996 { delay_loop 1 lb_algo wrr lb_kind DR persistence_granularity 255.255.255.255 protocol TCP real_server 127.0.0.1 6996 { weight 10 notify_down "/etc/init.d/glusterfsd stop ; /etc/init.d/glusterfsd start" MISC_CHECK { misc_path "/scripts/check-glusterfsd.pl 127.0.0.1 6996 1" } } real_server 192.168.0.1 6996 { weight 1 MISC_CHECK { misc_path "/scripts/check-glusterfsd.pl 192.168.0.1 6996 1" } } real_server 192.168.0.2 6996 { weight 1 MISC_CHECK { misc_path "/scripts/check-glusterfsd.pl 192.168.0.2 6996 1" } } }