On Fri, Mar 01, 2002 at 01:05:54PM -0800, Nivedita Singhvi wrote: > I dont understand whats going on, but after connection establishment, > and a successful exchange of a 100 byte request, it doesnt look like > any of the acks sent by the client are reaching the oracle server. Ok. > The client is seeing the data from oracle coming in, since tcpdump > is running at your end. Note this is tcpdump on the masq gateway (pimlott.ne.mediaone.net). As I mentioned, I don't know how to make tcpdump work on the client machine, where eth0 is the ray_cs driver. > 6] 12:48:28.673873 bigip-www.us.oracle.com.www > > pimlott.ne.mediaone.net.62080: > P 1:1381(1380) ack 102 win 8192 > > oracle sends bytes 1 - 1380 in a full sized packet to client > > 7] 12:48:28.673873 bigip-www.us.oracle.com.www > > pimlott.ne.mediaone.net.62080: > P 1381:1492(111) ack 102 win 8192 > > oracle sends 111 more bytes to client > > 8] 12:48:28.673873 bigip-www.us.oracle.com.www > > pimlott.ne.mediaone.net.62080: > P 1492:2872(1380) ack 102 win 64860 (DF) > > oracle sends 1380 additional bytes (1492 - 2871) > > 9] 12:48:28.693873 pimlott.ne.mediaone.net.62080 > > bigip-www.us.oracle.com.www: > . ack 1381 win 8280 (DF) > > client acks 1381 i.e it received packet 6], at least. > > 10] 12:48:28.703873 pimlott.ne.mediaone.net.62080 > > bigip-www.us.oracle.com.www: > . ack 1381 win 8280 (DF) > > client acks 1381 again, which implies it did not get packet 7], > probably, and got packet 8] directly, which is out of order data, > and triggers another ack. Ok, this is a possible clue I would not have figured out on my own. I re-did the trace dumping the packet contents, and verified that the application on the client gets exactly the first (1380 byte) packet, no more. Could the small size of packet 7 be a clue (looking at a text dump of the packets, it doesn't correlate to any obvious boundary in the text)? > This is strange, because if tcpdump is > running on your client, then the packet was at least seen by the > NIC, but not by TCP. It might have been dropped for another > reason at a higher point in the stack. Again, it was only the masq gateway that saw the packet. There are no other ipchains rules on the gateway, other than the single MASQ rule. > 11] 12:48:28.793873 bigip-www.us.oracle.com.www > > pimlott.ne.mediaone.net.62080: > . 2872:4252(1380) ack 102 win 64860 (DF) > > oracle sends more data: bytes 2872 - 4251. > > 12] 12:48:28.803873 pimlott.ne.mediaone.net.62080 > > bigip-www.us.oracle.com.www: > . ack 1381 win 8280 (DF) > > client sends an ack again of 1381, since it has a hole in its > receive sequence > > 13] 12:48:32.843873 bigip-www.us.oracle.com.www > > pimlott.ne.mediaone.net.62080: > P 1:1381(1380) ack 102 win 8192 > > oracle for some reason resends the first packet again > (1 - 1380), which was actually acked by the client. > oracle didnt get the ack, most likely. Retransmission timer went off. That's especially suspicious, since this is an ack from the gateway, and I already know that the gateway can communicate fine with www.oracle.com. I re-did a tcpdump of a wget directly from the gateway (which I can send if it would help), and it looks totally normal. The ack of the first 1380 bytes looks like 17:03:18.903873 pimlott.ne.mediaone.net.4056 > bigip-www.us.oracle.com.www: . ack 1381 win 31740 (DF) which is identical to the acks in the "bad" sequence, except for having a bigger window. Could that have anything to do with it? Thanks for the explanations. This gives me a much better chance at doing further digging myself. Andrew - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html