Hi again, Looks like my scripts didn't make it through the mailing list attachment filter. I'll try again, dropping the Perl .pl extension, and encoding the content type as plain text. (KMail was being too helpful about the content type.) I've updated the keepalived.conf script accordingly. Sorry about the extra list traffic. Kind regards, Geoff Kassel. On Fri, 14 Sep 2007, Geoff Kassel wrote: > Hi all, > Just thought I'd reply to myself, since I've worked out something of a > solution to my GlusterFS HA issues. > > The solution turned out to be abandoning Heartbeat and LDirectorD, and > using Keepalived with a few heartbeat-like scripts. This gives server > load-balancing, two-second failover and some limited recovery ability, > without filling up log files with peer EOF messages. > > While the two-second failover is not friendly to already running > processes requiring continuous filesystem access (i.e. find, du, copy > operations, databases etc), this should be reasonably adequate for use with > web and mail servers where a few seconds downtime should not be of much > issue. (At least, I hope so, since I'll be deploying web and mail services > based around this configuration shortly. Wish me luck.) > > For the curious, I've attached the associated Keepalived configuration > file and Perl scripts. The exact same Keepalived configuration should be in > use across all hosts running GlusterFS servers. To use this with GlusterFS > clients, just set the remote host in the GlusterFS client translator to the > virtual server IP address you're using in keepalived.conf. > > Scripts and configuration files are released under the LGPL and the FDL, > respectively. Enjoy! > > Kind regards, > > Geoff Kassel. > > On Tue, 11 Sep 2007, Geoff Kassel wrote: > > Hi all, > > I'm trying to set up LDirectorD (through Heartbeat) to load-balance > > and failover client connections to GlusterFS server instances over TCP. > > > > First of all, I'm curious to find out if anyone else has attempted > > this, as I've had no luck with maintaining client continuity with > > round-robin DNS in /etc/hosts and client timeouts, as advised in previous > > posts and tutorials. The clients just go dead with 'Transport endpoint is > > not connected' messages. > > > > My main problem is that LDirectorD doesn't seem to recognize that a > > GlusterFS server is functional through the connection test method, so I > > can't detect if a server goes down. While LDirectorD does a > > request-response method of liveness detection, the GlusterFS protocol is > > unfortunately too lengthy to use in the configuration files. (It needs to > > be a request that can fit on a single line, it seems.) > > > > I'm wondering if there's a simple request-response connection test I > > haven't found yet that I can use to check for liveness of a server over > > TCP. If there isn't... could I make a feature request for such? Anything > > that can be done manually over a telnet connection to the port would be > > perfect. > > > > Thank you for GlusterFS, and thanks in advance for your time and > > effort in answering my question. > > > > Kind regards, > > > > Geoff Kassel. > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel
#!/usr/bin/env bash # Usage: check-glusterfsd <hostname [localhost]> <port [6996]> <period [1]> # # Non-blocking GlusterFS heartbeat ping checking program. # Launches a heartbeat ping monitoring script to monitor the # GlusterFS server on the given host at the given port every given # period of time (in seconds.) # # Copyright (C) Geoff Kassel, 2007. Released under the LGPL. host=${1:-localhost} port=${2:-6996} period=${3:-1} #Check whether a glusterfsd-heartbeat process is already running. pkill -0 -f "glusterfsd-heartbeat $host $port $period" && exit 0 # If we're here, no heartbeat process is running. Try to start one. glusterfsd-heartbeat $host $port $period & # Sleep a few seconds. If our spawned process is still running, # we exit successfully. sleep 3 pkill -0 -f "glusterfsd-heartbeat $host $port $period" || exit 1
#!/usr/bin/env perl # Usage: glusterfsd-heartbeat <hostname [localhost]> <port [6996]> <period [1]> # # Continously pings the GlusterFS server running on the given host at the given # port every given time period (in seconds), exiting if the connection is lost. # # Copyright (C) Geoff Kassel, 2007. Released under the LGPL. use strict; use IO::Socket; my ($host, $port, $period, $pingPacket, $line); $host = "localhost"; $port = "6996"; $period = 1; if (@ARGV == 1) { ($host) = @ARGV; } if (@ARGV == 2) { ($host, $port) = @ARGV; } if (@ARGV == 3) { ($host, $port, $period) = @ARGV; } if (@ARGV > 3) { die "Usage: $0 <hostname [localhost]> <port [6996]> <period [1]>"; } # Try to open a connection to the server. my $sock = new IO::Socket::INET ( PeerAddr => $host, PeerPort => $port, Proto => 'tcp', ); die "Could not connect to GlusterFS server $host:$port: $!\n" unless $sock; # Set synchronous communication. $sock->autoflush(1); # Synchronous communication. # Generate ping packet to send. $pingPacket = "Block Start\n"; # Start of protocol $pingPacket .= "0000000000000042\n"; # Stream ID - 42 will do. $pingPacket .= "00000001\n"; # GF_OP_TYPE_MOP_REQUEST - management request. $pingPacket .= "00000002\n"; # GF_MOP_STATS - get server stats. $pingPacket .= "Ping!---------------------------\n"; # Description of packet. $pingPacket .= "00000000000000000000000000000022\n"; # Message contents size. # Start of message contents - serialized dictionary type. (Total size 35 - 1.) $pingPacket .= "00000001\n"; # Number of keys in message dictionary. (len: 9) $pingPacket .= "00000006:"; # Key length (incl. null) + ':' (len: 9) $pingPacket .= "00000002\n"; # Key value length (incl. null.) (len: 18) $pingPacket .= "FLAGS\00\0"; # Dummy dictionary entry. (len: 8) # End of message contents. $pingPacket .= "Block End\n"; # End of protocol # Send an initial ping packet, before starting the heartbeat loop. print $sock $pingPacket; # Loop until the socket closes. while (<$sock>) { # Read in everything sent. while (defined ($line = <$sock>)) { #print STDOUT $line; # Have we now read a full response message? if ($line =~ /Block End/) { # Sleep for the given time period, then send another ping. sleep $period; print $sock $pingPacket; } } } close($sock); die "GlusterFS server $host:$port has died!\n";
#!/usr/bin/env perl # Usage: ping-glusterd <hostname:localhost> <port:6996> # # Sends one ping only to the GlusterFS server on the given host at the given # port. # # Copyright (C) Geoff Kassel, 2007. Released under the LGPL. use strict; use IO::Socket; my ($host, $port, $line); $host = "localhost"; $port = "6996"; if (@ARGV == 1) { ($host) = @ARGV; } if (@ARGV == 2) { ($host, $port) = @ARGV; } if (@ARGV > 2) { die "Usage: $0 <hostname> <port>"; } # Try to open a connection to the server. my $sock = new IO::Socket::INET ( PeerAddr => $host, PeerPort => $port, Proto => 'tcp', ); die "Could not connect to GlusterFS server $host:$port: $!\n" unless $sock; # Set synchronous communication. $sock->autoflush(1); # Ping packet to send: print $sock "Block Start\n"; # Start of protocol print $sock "0000000000000042\n"; # Stream ID - 42 will do. print $sock "00000001\n"; # GF_OP_TYPE_MOP_REQUEST - management request. print $sock "00000002\n"; # GF_MOP_STATS - get server stats. print $sock "--------------------------------\n"; # Description of packet purpose. print $sock "00000000000000000000000000000022\n"; # Message contents size of 35 - 1 = 22 in hex. # Start of message contents - serialized dictionary type. print $sock "00000001\n"; # Number of keys in message dictionary. (len: 9) print $sock "00000006:00000002\n"; # Key length (incl. null) ':' value length (incl. null.) (len: 18) print $sock "FLAGS\00\0"; # Dummy dictionary entry required for message. (len: 8) # End of message contents. print $sock "Block End\n"; # End of protocol # Read in the response from the server. while (defined ($line = <$sock>)) { #print STDOUT $line; # Have we now read a full response message? if ($line =~ /Block End/) { close($sock); print "Received response from server.\n"; exit; } } close($sock); die "No response from server!\n";
! Configuration File for keepalived # Distributed storage - GlusterFS vrrp_instance INT_GLUSTERFSD { interface eth0 state MASTER virtual_router_id 1 priority 10 advert_int 1 authentication { auth_type PASS auth_pass *secret* } virtual_ipaddress { 192.168.0.3 } } virtual_server 192.168.0.3 6996 { delay_loop 1 lb_algo wrr lb_kind DR persistence_granularity 255.255.255.255 protocol TCP real_server 127.0.0.1 6996 { weight 10 notify_down "/etc/init.d/glusterfsd stop ; /etc/init.d/glusterfsd start" MISC_CHECK { misc_path "/scripts/check-glusterfsd 127.0.0.1 6996 1" } } real_server 192.168.0.1 6996 { weight 1 MISC_CHECK { misc_path "/scripts/check-glusterfsd 192.168.0.1 6996 1" } } real_server 192.168.0.2 6996 { weight 1 MISC_CHECK { misc_path "/scripts/check-glusterfsd 192.168.0.2 6996 1" } } }