This is a long post. Please bear with me. I don't think I'm off my rocker, though I'm willing to argue the point. I'm running RedHat 7.3 with their kernel 2.4.18-18.7x and their "printconf" printer interface package. Sometimes when printing over TCP via the "jetdirectprint" perl script that's part of printconf, the print job data gets truncated and the printer reports an error. I've collected traces of the application using strace and of the network traffic using ethereal and I've studied the various rfc's for TCP and I've convinced myself that the last thing I expected to find is true: the problem is a defect in the Linux TCP stack. The sequence of events is like this: - jetdirectprint opens a TCP connection to the printer and begins sending the data, and the printer begins accepting the data. No. Source Destination Info 1 192.168.1.184 192.168.1.2 37853 > jetdirect [SYN] Seq=1214590847 Ack=0 Win=5840 Len=0 2 192.168.1.2 192.168.1.184 jetdirect > 37853 [SYN, ACK] Seq=31547308 Ack=1214590848 Win=8760 Len=0 3 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214590848 Ack=31547309 Win=5840 Len=0 4 192.168.1.184 192.168.1.2 37853 > jetdirect [PSH, ACK] Seq=1214590848 Ack=31547309 Win=5840 Len=15 5 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214590863 Ack=31547309 Win=5840 Len=1460 6 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547309 Ack=1214592323 Win=7285 Len=0 7 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214592323 Ack=31547309 Win=5840 Len=1460 - While processing the data, the printer sends back a few bytes of status information (packet 164). But jetdirectprint never reads it, so the status info stays in the TCP receive queue. 162 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214744868 Ack=31547309 Win=5840 Len=1460 163 192.168.1.184 192.168.1.2 37853 > jetdirect [PSH, ACK] Seq=1214746328 Ack=31547309 Win=5840 Len=1460 164 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547309 Ack=1214746328 Win=1460 Len=45 165 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214747788 Ack=31547354 Win=5840 Len=0 166 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 Ack=1214747788 Win=0 Len=0 167 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 Ack=1214747788 Win=2920 Len=0 168 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214747788 Ack=31547354 Win=5840 Len=1460 - jetdirectprint finishes writing all the data then while some of the data is still in the transmit queue waiting to go out, jetdirectprint closes the socket. (strace this time.) write(4, "ffffffffffffffffffffffffffffffff"..., 73) = 73 write(4, "ffffffffffffffffffffffffffffffff"..., 73) = 73 write(4, "fffefffffffcfafffdfeffffffffffff"..., 43) = 43 write(4, "grestore\n", 9) = 9 write(4, " grestore \n", 11) = 11 write(4, " grestore \n", 11) = 11 write(4, "pagelevel restore\n", 18) = 18 write(4, "showpage\n", 9) = 9 write(4, "%%Trailer\n", 10) = 10 write(4, "%%Pages: 1\n", 11) = 11 write(4, "%%EOF\n", 6) = 6 read(0, "", 4096) = 0 close(4) = 0 - But in tcp_close(), when close() is called on a socket the receive queue is immediately flushed, and if it wasn't empty a RST is sent with a SEQ corresponding to the last data sent, the connection is torn down and the remaining unsent transmit data is lost. 946 192.168.1.184 192.168.1.2 37853 > jetdirect [PSH, ACK] Seq=1215499688 Ack=31547354 Win=5840 Len=1460 947 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 Ack=1215501148 Win=2920 Len=0 948 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1215501148 Ack=31547354 Win=5840 Len=1460 949 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1215502608 Ack=31547354 Win=5840 Len=1460 950 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 Ack=1215504068 Win=2920 Len=0 951 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1215504068 Ack=31547354 Win=5840 Len=1460 952 192.168.1.184 192.168.1.2 37853 > jetdirect [PSH, ACK] Seq=1215505528 Ack=31547354 Win=5840 Len=1460 953 192.168.1.184 192.168.1.2 37853 > jetdirect [RST, ACK] Seq=1215506988 Ack=31547354 Win=5840 Len=0 <no more packets exchanged> I've read rfc793, rfc1122, and rfc2525, and I understand that tcp_close() is trying to do a half duplex close by sending a RST instead of a FIN when the application closes the socket when there's queued recieve data. But I think it's jumping the gun by doing it immediately. It should wait until the transmit queue is empty before processing the CLOSE and deciding whether to send a FIN or a RST. There's some basis to support that in rfc793 section 3.9 regarding CLOSE on an ESTABLISHED connection: CLOSE call ESTABLISHED STATE Queue this until all preceding SENDs have been segmentized, then form a FIN segment and send it. In any case, enter FIN-WAIT-1 state. That's saying the CLOSE shouldn't be processed until all preceding SENDs are done. In this case that would mean neither a FIN nor a RST would be sent when the CLOSE call was made, then when the transmit queue was emptied the receive queue would be cleared and a RST would be sent to indicate to the printer that the status info was lost. Comments? Jim - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html