Hi Majied, > With Heartbeat, the only thing required would be a "failover IP" handled > between the two Heartbeat servers running the glusterfsd process using > AFR. When a glusterfsd server running Heartbeat goes down, the other > Heartbeat server would take over the failover IP and continue service to > the glusterfs clients. I think I see what you're getting at here - haresources on both machines something like: node1 192.168.0.1 glusterfsd node2 192.168.0.2 glusterfsd Am I correct? I haven't had much luck with getting services running through Heartbeat in the past (Gentoo's initscripts not being directly compatible with Heartbeat's status exit level requirements), but looking at the glusterfsd initscript it looks like it might work with Heartbeat. I'll give this a try, and see how I go. Thanks for the idea! I know this may be a bit off topic for the list (if there was a gluster-users list I'd move this there), but for anyone who was curious (I'm still curious about a solution to this myself) I was trying to configure Heartbeat + LDirectorD the following way: /etc/ha.d/haresources: node1 ldirectord::ldirectord.cf LVSSyncDaemonSwap::master \ IPaddr2::192.168.0.3/24/eth0/192.168.0.255 # node2 is a live backup for failover on the Linux Virtual Server daemon # if node1 goes down etc/ha.d/ldirectord.cf: checktimeout=1 checkinterval=1 autoreload=yes logfile="/var/log/ldirectord.log" quiescent=yes virtual=192.168.0.3:6996 real=192.168.0.1:6996 gate 1 real=192.168.0.2:6996 gate 1 checktype=connect scheduler=rr protocol=tcp The real glusterfsd servers are running on 192.168.0.1 (node1) and 192.168.0.2 (node2), and clients connect to the virtual IP address, 192.168.0.3. However, ipvsadm after Heartbeat starts does not show either connection as up, even though telnet to both real servers on port 6996 connects. If I configure a fallback (say, to 127.0.0.1:6996), I only ever get the fallback through 192.168.0.3, and if I stop that machine, any connections through 192.168.0.3 stop too. Heartbeat doesn't see that as a failure condition - with or without the fallback node - so the IP address and LVS won't fail over to the other node. I can't see a way to configure Heartbeat to do so either. Hence my question about finding a way to get LDirectorD to do the detection in a more robust request-response manner. Majied, do you or any one else on the list have a suggestion for what I may have missed? Thank you all in advance for any and all suggestions (including RTFM again :) Kind regards, Geoff Kassel. On Wed, 12 Sep 2007, Majied Najjar wrote: > Hi, > > > This is just my two cents. :-) > > > Instead of LDirectorD, I would recommend just using Heartbeat. > > > With Heartbeat, the only thing required would be a "failover IP" handled > between the two Heartbeat servers running the glusterfsd process using > AFR. When a glusterfsd server running Heartbeat goes down, the other > Heartbeat server would take over the failover IP and continue service to > the glusterfs clients. > > > Granted, this isn't loadbalancing between glusterfsd servers and only > handles failover.... > > > Majied Najjar > > Geoff Kassel wrote: > > Hi all, > > I'm trying to set up LDirectorD (through Heartbeat) to load-balance > > and failover client connections to GlusterFS server instances over TCP. > > > > First of all, I'm curious to find out if anyone else has attempted > > this, as I've had no luck with maintaining client continuity with > > round-robin DNS in /etc/hosts and client timeouts, as advised in previous > > posts and tutorials. The clients just go dead with 'Transport endpoint is > > not connected' messages. > > > > My main problem is that LDirectorD doesn't seem to recognize that a > > GlusterFS server is functional through the connection test method, so I > > can't detect if a server goes down. While LDirectorD does a > > request-response method of liveness detection, the GlusterFS protocol is > > unfortunately too lengthy to use in the configuration files. (It needs to > > be a request that can fit on a single line, it seems.) > > > > I'm wondering if there's a simple request-response connection test I > > haven't found yet that I can use to check for liveness of a server over > > TCP. If there isn't... could I make a feature request for such? Anything > > that can be done manually over a telnet connection to the port would be > > perfect. > > > > Thank you for GlusterFS, and thanks in advance for your time and > > effort in answering my question. > > > > Kind regards, > > > > Geoff Kassel. > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel