Solaris - Linux Problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all
I've asked about this problem before but wanted to bring it up again
with some results from Friday's testing.  The situation involves a
Solaris box running some Java code acting as a client to a Linux box
running my code as a server.  The link will fail when the traffic on the
LAN is heavy but the Linux box (RedHat 6.2) thinks the connection is
still there.

The server side uses Selects to see if there is data and a readsocket
function to get the requests.  If the readsocket function detects no
data it reads a couple more times because there is always two zero data
returns before it gets a -1 return.

Here is some of the email I received from the technician who works for
the company who wrote the Solaris software.  Any light on the subject
would be appreciated.

--------------------------------------------------------------------------------
Here are the results of my testing on Friday and this weekend.

Killing/stopping the Java Service or your process does NOT cause the
problem.  I NEVER could reproduce the problem that way; I always just
had to wait for the problem to occur.  So, I looked into that more.....

I setup a continuous ping ("hearbeat") from the Solaris box to the Linux
box and started all the processes and began sending requests. 
Everything was fine for about 20 minutes.  The requests went thru and
the pings never failed.

Then, one of my requests failed and the problem had obviously
reoccurred.  The Java service lost it's connection to the Linux process
and was continuously attempting to reconnect the socket.  The Java
service logs confirmed this.  Now all requests failed.

Right before my requests started failing, two of the ping requests timed
out.  Also, a couple of other ping requests did make it thru, but took
an extraordinary amount of time (491ms and 271ms instead of the usual
40ms).  Furthermore, my telnet session into the Linux box died at the
same time and it was VERY slow reconnecting.

According to netstat command on the Solaris side, the connection was
gone; the port was no longer in use.  According to netstat on the Linux
side, the connection was still ESTABLISHED and the port was very much
still in use.

I reproduced this a couple of times, each time with the same result.
begin:vcard 
n:Griffin;Lawrence
tel;cell:(210) 269-7708
tel;fax:(210) 829-4220
tel;work:(210) 576-1174
x-mozilla-html:FALSE
org:Southwest DataCom Corp.
adr:;;P.O. Box 460485;San Antonio;Texas;78246;USA
version:2.1
email;internet:lgriffin@texas.net
x-mozilla-cpt:;22400
fn:Lawrence Griffin
end:vcard

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux