Re: HA, GlusterFS server protocol and LDirectorD

Majied Najjar <majied.najjar@xxxxxxxxxxxxxxx> · Tue, 11 Sep 2007 22:57:56 -0400

Hi Geoff,

Actually, I was thinking of something a bit simpler:

/etc/ha.d/haresources:

somehost \
                    IPaddr2::10.24.0.254/24/eth1/10.24.0.255

You could run the daemon on both machines at the same time.  The only
thing that would designate a host as "master" would be the failover IP
which would be in this case "10.10.0.254".

Majied

Geoff Kassel wrote:
> Hi Majied,
>
>   
>> With Heartbeat, the only thing required would be a "failover IP" handled
>> between the two Heartbeat servers running the glusterfsd process using
>> AFR.  When a glusterfsd server running Heartbeat goes down, the other
>> Heartbeat server would take over the failover IP and continue service to
>> the glusterfs clients.
>>     
>
> I think I see what you're getting at here - haresources on both machines 
> something like:
>
> node1 192.168.0.1 glusterfsd
> node2 192.168.0.2 glusterfsd
>
> Am I correct?
>
> I haven't had much luck with getting services running through Heartbeat in the 
> past (Gentoo's initscripts not being directly compatible with Heartbeat's 
> status exit level requirements), but looking at the glusterfsd initscript it 
> looks like it might work with Heartbeat.
>
> I'll give this a try, and see how I go. Thanks for the idea!
>
>
> I know this may be a bit off topic for the list (if there was a gluster-users 
> list I'd move this there), but for anyone who was curious (I'm still curious 
> about a solution to this myself) I was trying to configure Heartbeat + 
> LDirectorD the following way:
>
> /etc/ha.d/haresources:
>
> node1   ldirectord::ldirectord.cf LVSSyncDaemonSwap::master \ 
> IPaddr2::192.168.0.3/24/eth0/192.168.0.255
>
> # node2 is a live backup for failover on the Linux Virtual Server daemon 
> # if node1 goes down
>
> etc/ha.d/ldirectord.cf:
>
> checktimeout=1
> checkinterval=1
> autoreload=yes
> logfile="/var/log/ldirectord.log"
> quiescent=yes
>
> virtual=192.168.0.3:6996
>         real=192.168.0.1:6996 gate 1
>         real=192.168.0.2:6996 gate 1
>         checktype=connect
>         scheduler=rr
>         protocol=tcp
>
>
> The real glusterfsd servers are running on 192.168.0.1 (node1) and 192.168.0.2 
> (node2), and clients connect to the virtual IP address, 192.168.0.3. 
>
> However, ipvsadm after Heartbeat starts does not show either 
> connection as up, even though telnet to both real servers on port 6996 
> connects. If I configure a fallback (say, to 127.0.0.1:6996), I only ever get 
> the fallback through 192.168.0.3, and if I stop that machine, any connections 
> through 192.168.0.3 stop too.
>
> Heartbeat doesn't see that as a failure condition - with or without the 
> fallback node - so the IP address and LVS won't fail over to the other node. 
> I can't see a way to configure Heartbeat to do so either. Hence my question 
> about finding a way to get LDirectorD to do the detection in a more robust 
> request-response manner.
>
> Majied, do you or any one else on the list have a suggestion for what I may 
> have missed? 
>
> Thank you all in advance for any and all suggestions (including RTFM again :)
>
> Kind regards,
>
> Geoff Kassel.
>
> On Wed, 12 Sep 2007, Majied Najjar wrote:
>   
>> Hi,
>>
>>
>> This is just my two cents. :-)
>>
>>
>> Instead of LDirectorD, I would recommend just using Heartbeat.
>>
>>
>> With Heartbeat, the only thing required would be a "failover IP" handled
>> between the two Heartbeat servers running the glusterfsd process using
>> AFR.  When a glusterfsd server running Heartbeat goes down, the other
>> Heartbeat server would take over the failover IP and continue service to
>> the glusterfs clients.
>>
>>
>> Granted, this isn't loadbalancing between glusterfsd servers and only
>> handles failover....
>>
>>
>> Majied Najjar
>>
>>  Geoff Kassel wrote:
>>     
>>> Hi all,
>>>    I'm trying to set up LDirectorD (through Heartbeat) to load-balance
>>> and failover client connections to GlusterFS server instances over TCP.
>>>
>>>    First of all, I'm curious to find out if anyone else has attempted
>>> this, as I've had no luck with maintaining client continuity with
>>> round-robin DNS in /etc/hosts and client timeouts, as advised in previous
>>> posts and tutorials. The clients just go dead with 'Transport endpoint is
>>> not connected' messages.
>>>
>>>    My main problem is that LDirectorD doesn't seem to recognize that a
>>> GlusterFS server is functional through the connection test method, so I
>>> can't detect if a server goes down. While LDirectorD does a
>>> request-response method of liveness detection, the GlusterFS protocol is
>>> unfortunately too lengthy to use in the configuration files. (It needs to
>>> be a request that can fit on a single line, it seems.)
>>>
>>>    I'm wondering if there's a simple request-response connection test I
>>> haven't found yet that I can use to check for liveness of a server over
>>> TCP. If there isn't... could I make a feature request for such? Anything
>>> that can be done manually over a telnet connection to the port would be
>>> perfect.
>>>
>>>    Thank you for GlusterFS, and thanks in advance for your time and
>>> effort in answering my question.
>>>
>>> Kind regards,
>>>
>>> Geoff Kassel.
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel@xxxxxxxxxx
>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>       
>
>