Just wanted to update you all. Turns out the problem is my Juniper Firewall - sort of. I've created a service in our Juniper that describes "Gluster" and allowed the "tcp session" to never timeout. The problem comes when a server crashes and the TCP connection isn't "cleaned up". It looks like the gluster client always starts using the same outbound (source) TCP port and in our firewall that source/dest port combination is already in use (never times out right) and the firewall isn't allowing it to be created again - so its blocked. So right now if i do a netstat -pan tcp 0 1 10.10.10.101:996 10.20.10.102:6996 SYN_SENT 23491/glusterfs tcp 0 1 10.10.10.101:997 10.20.10.102:6996 SYN_SENT 23491/glusterfs tcp 0 1 10.10.10.101:1000 10.20.10.102:6996 SYN_SENT 23491/glusterfs tcp 0 0 10.10.10.101:1001 10.20.10.102:6996 ESTABLISHED 23491/glusterfs tcp 0 0 10.10.10.101:999 10.20.10.101:6996 ESTABLISHED 23491/glusterfs tcp 0 1 10.10.10.101:998 10.20.10.101:6996 SYN_SENT 23491/glusterfs tcp 0 1 10.10.10.101:1003 10.20.10.101:6996 SYN_SENT 23491/glusterfs tcp 0 1 10.10.10.101:1002 10.20.10.101:6996 SYN_SENT 23491/glusterfs Now if i kill the gluster process and restart it again....notice the source port doesn't change... tcp 0 1 10.10.10.101:996 10.20.10.102:6996 SYN_SENT 23687/glusterfs tcp 0 1 10.10.10.101:997 10.20.10.102:6996 SYN_SENT 23687/glusterfs tcp 0 1 10.10.10.101:1000 10.20.10.102:6996 SYN_SENT 23687/glusterfs tcp 0 0 10.10.10.101:1001 10.20.10.102:6996 ESTABLISHED 23687/glusterfs tcp 0 0 10.10.10.101:999 10.20.10.101:6996 ESTABLISHED 23687/glusterfs tcp 0 1 10.10.10.101:998 10.20.10.101:6996 SYN_SENT 23687/glusterfs tcp 0 1 10.10.10.101:1003 10.20.10.101:6996 SYN_SENT 23687/glusterfs tcp 0 1 10.10.10.101:1002 10.20.10.101:6996 SYN_SENT 23687/glusterfs Now if i kill and restart a few times...i can get lucky and get a different source port...but you can see i'm still missing a few bricks. tcp 0 0 10.10.10.101:994 10.20.10.102:6996 ESTABLISHED 23745/glusterfs tcp 0 0 10.10.10.101:995 10.20.10.102:6996 ESTABLISHED 23745/glusterfs tcp 0 0 10.10.10.101:998 10.20.10.102:6996 ESTABLISHED 23745/glusterfs tcp 0 1 10.10.10.101:1000 10.20.10.102:6996 SYN_SENT 23745/glusterfs tcp 0 0 10.10.10.101:997 10.20.10.101:6996 ESTABLISHED 23745/glusterfs tcp 0 0 10.10.10.101:996 10.20.10.101:6996 ESTABLISHED 23745/glusterfs tcp 0 1 10.10.10.101:1003 10.20.10.101:6996 SYN_SENT 23745/glusterfs tcp 0 1 10.10.10.101:1002 10.20.10.101:6996 SYN_SENT 23745/glusterfs Now telnet works always because it always picks a random source port: $ telnet 10.20.10.102 6996 Trying 10.20.10.102... Connected to glusterserver (10.20.10.102). Escape character is '^]'. $ netstat -pan|grep telne tcp 0 0 10.10.10.101:58757 10.20.10.102:6996 ESTABLISHED 23622/telnet Why does gluster not use a more random source port?? I'm going to have to dig through the Juniper docs to see if i can manually close an active session (lets hope) which should fix my immediate problem but it doesn't really fix the long term problem. Thoughts? thanks, liam On Fri, Dec 3, 2010 at 6:51 PM, Liam Slusser <lslusser at gmail.com> wrote: > Ah the two different IPs are because I was changing my IPs for this mailing > list and I guess I forgot that one.? :)? Will try added a static route. > Also going to snoop traffic and see if the gluster client is actually > getting to the server or being blocked by the firewall.? Ill letcha all know > what I find. > > Thanks for the ideas. > > Liam > > On Dec 3, 2010 6:32 PM, <mki-glusterfs at mozone.net> wrote: >> On Fri, Dec 03, 2010 at 04:25:18PM -0800, Liam Slusser wrote: >>> [root at client~]# netstat -pan|grep glus >>> tcp 0 1 10.8.10.107:1000 10.8.11.102:6996 SYN_SENT 3385/glusterfs >>> >>> from the gluster client log: >>> >>> However, the port is obviously open... >>> >>> [root at client~]# telnet 10.8.11.102 6996 >>> Trying 10.2.56.102... >>> Connected to glusterserverb (10.8.11.102). >>> Escape character is '^]'. >>> ^] >>> telnet> close >>> Connection closed. >> >> Looking further... why is your telnet trying 10.2.56.102 when you >> clearly specified 10.8.11.102? Also, what happens if you do a >> specific route for the 10.8.11.0/24 block thru the appropriate gw >> without relying on the default gw to route for you? In this way >> you dont end up in a situation where the client is mistakenly >> trying to go over the wrong interface. The telnet maybe switching >> to an alternate interface to see if it gets thru? >> >> Mohan >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >