Hi all I've asked about this problem before but wanted to bring it up again with some results from Friday's testing. The situation involves a Solaris box running some Java code acting as a client to a Linux box running my code as a server. The link will fail when the traffic on the LAN is heavy but the Linux box (RedHat 6.2) thinks the connection is still there. The server side uses Selects to see if there is data and a readsocket function to get the requests. If the readsocket function detects no data it reads a couple more times because there is always two zero data returns before it gets a -1 return. Here is some of the email I received from the technician who works for the company who wrote the Solaris software. Any light on the subject would be appreciated. -------------------------------------------------------------------------------- Here are the results of my testing on Friday and this weekend. Killing/stopping the Java Service or your process does NOT cause the problem. I NEVER could reproduce the problem that way; I always just had to wait for the problem to occur. So, I looked into that more..... I setup a continuous ping ("hearbeat") from the Solaris box to the Linux box and started all the processes and began sending requests. Everything was fine for about 20 minutes. The requests went thru and the pings never failed. Then, one of my requests failed and the problem had obviously reoccurred. The Java service lost it's connection to the Linux process and was continuously attempting to reconnect the socket. The Java service logs confirmed this. Now all requests failed. Right before my requests started failing, two of the ping requests timed out. Also, a couple of other ping requests did make it thru, but took an extraordinary amount of time (491ms and 271ms instead of the usual 40ms). Furthermore, my telnet session into the Linux box died at the same time and it was VERY slow reconnecting. According to netstat command on the Solaris side, the connection was gone; the port was no longer in use. According to netstat on the Linux side, the connection was still ESTABLISHED and the port was very much still in use. I reproduced this a couple of times, each time with the same result.
begin:vcard n:Griffin;Lawrence tel;cell:(210) 269-7708 tel;fax:(210) 829-4220 tel;work:(210) 576-1174 x-mozilla-html:FALSE org:Southwest DataCom Corp. adr:;;P.O. Box 460485;San Antonio;Texas;78246;USA version:2.1 email;internet:lgriffin@texas.net x-mozilla-cpt:;22400 fn:Lawrence Griffin end:vcard