Re: HA, GlusterFS server protocol and LDirectorD

Geoff Kassel <gkassel@xxxxxxxxxxxxxxxxxxxxx> · Wed, 12 Sep 2007 11:48:42 +1000

Hi Majied,

> With Heartbeat, the only thing required would be a "failover IP" handled
> between the two Heartbeat servers running the glusterfsd process using
> AFR.  When a glusterfsd server running Heartbeat goes down, the other
> Heartbeat server would take over the failover IP and continue service to
> the glusterfs clients.

I think I see what you're getting at here - haresources on both machines 
something like:

node1 192.168.0.1 glusterfsd
node2 192.168.0.2 glusterfsd

Am I correct?

I haven't had much luck with getting services running through Heartbeat in the 
past (Gentoo's initscripts not being directly compatible with Heartbeat's 
status exit level requirements), but looking at the glusterfsd initscript it 
looks like it might work with Heartbeat.

I'll give this a try, and see how I go. Thanks for the idea!

I know this may be a bit off topic for the list (if there was a gluster-users 
list I'd move this there), but for anyone who was curious (I'm still curious 
about a solution to this myself) I was trying to configure Heartbeat + 
LDirectorD the following way:

/etc/ha.d/haresources:

node1   ldirectord::ldirectord.cf LVSSyncDaemonSwap::master \ 
IPaddr2::192.168.0.3/24/eth0/192.168.0.255

# node2 is a live backup for failover on the Linux Virtual Server daemon 
# if node1 goes down

etc/ha.d/ldirectord.cf:

checktimeout=1
checkinterval=1
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=yes

virtual=192.168.0.3:6996
        real=192.168.0.1:6996 gate 1
        real=192.168.0.2:6996 gate 1
        checktype=connect
        scheduler=rr
        protocol=tcp

The real glusterfsd servers are running on 192.168.0.1 (node1) and 192.168.0.2 
(node2), and clients connect to the virtual IP address, 192.168.0.3. 

However, ipvsadm after Heartbeat starts does not show either 
connection as up, even though telnet to both real servers on port 6996 
connects. If I configure a fallback (say, to 127.0.0.1:6996), I only ever get 
the fallback through 192.168.0.3, and if I stop that machine, any connections 
through 192.168.0.3 stop too.

Heartbeat doesn't see that as a failure condition - with or without the 
fallback node - so the IP address and LVS won't fail over to the other node. 
I can't see a way to configure Heartbeat to do so either. Hence my question 
about finding a way to get LDirectorD to do the detection in a more robust 
request-response manner.

Majied, do you or any one else on the list have a suggestion for what I may 
have missed? 

Thank you all in advance for any and all suggestions (including RTFM again :)

Kind regards,

Geoff Kassel.

On Wed, 12 Sep 2007, Majied Najjar wrote:
> Hi,
>
>
> This is just my two cents. :-)
>
>
> Instead of LDirectorD, I would recommend just using Heartbeat.
>
>
> With Heartbeat, the only thing required would be a "failover IP" handled
> between the two Heartbeat servers running the glusterfsd process using
> AFR.  When a glusterfsd server running Heartbeat goes down, the other
> Heartbeat server would take over the failover IP and continue service to
> the glusterfs clients.
>
>
> Granted, this isn't loadbalancing between glusterfsd servers and only
> handles failover....
>
>
> Majied Najjar
>
>  Geoff Kassel wrote:
> > Hi all,
> >    I'm trying to set up LDirectorD (through Heartbeat) to load-balance
> > and failover client connections to GlusterFS server instances over TCP.
> >
> >    First of all, I'm curious to find out if anyone else has attempted
> > this, as I've had no luck with maintaining client continuity with
> > round-robin DNS in /etc/hosts and client timeouts, as advised in previous
> > posts and tutorials. The clients just go dead with 'Transport endpoint is
> > not connected' messages.
> >
> >    My main problem is that LDirectorD doesn't seem to recognize that a
> > GlusterFS server is functional through the connection test method, so I
> > can't detect if a server goes down. While LDirectorD does a
> > request-response method of liveness detection, the GlusterFS protocol is
> > unfortunately too lengthy to use in the configuration files. (It needs to
> > be a request that can fit on a single line, it seems.)
> >
> >    I'm wondering if there's a simple request-response connection test I
> > haven't found yet that I can use to check for liveness of a server over
> > TCP. If there isn't... could I make a feature request for such? Anything
> > that can be done manually over a telnet connection to the port would be
> > perfect.
> >
> >    Thank you for GlusterFS, and thanks in advance for your time and
> > effort in answering my question.
> >
> > Kind regards,
> >
> > Geoff Kassel.
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxx
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel