Hi Jim, The TCP option is SO_LINGER. I don't know about perl, so the code I given maybe contain bugs. Another way is how about this? s = socket() connect(s) while read(STDIN) { write(s) while (read(s)) { } } close(s) Just on your client side, perl read all status data which printer sent to you. I am not sure about if read(s) will block your code or not. You need to investigate on this. Cheers Leigh -----Original Message----- From: Jim Castleberry [mailto:cjn@cypress.com] Sent: Thursday, November 28, 2002 9:41 AM To: leighw@softtech.co.nz Cc: linux-net@vger.kernel.org Subject: Re: close() with queued receive data loses transmit data Huh? jetdirectprint is perl code that just does s = socket() connect(s) while read(STDIN) write(s) close(s) What socket option would you suggest modifying to fix it? The only workaround I know of at the application layer is to write(), then shutdown(s, SHUT_WR), then read() until an EOF indicates the other end has closed the connection, then close() - in other words to explicitly do a full duplex close. You're not seriously suggesting that every app that writes to a descriptor that might sometimes be a socket should be forced to do all that to guarantee that no data is lost, are you? Jim Leigh Wanstead wrote: >>>I've convinced myself that the last thing I expected to find is true: >>>the problem is a defect in the Linux TCP stack. >> > > IMHO, it is not a defect in the Linux TCP stack. That is jetdirectprint's > problem. > I don't know how to fix jetdirectprint's problem. I guess that if you got > source code of jetdirectprint. You need to modify socket option to get what > you want. > > -----Original Message----- > From: linux-net-owner@vger.kernel.org > [mailto:linux-net-owner@vger.kernel.org]On Behalf Of Jim Castleberry > Sent: Thursday, November 28, 2002 8:59 AM > To: linux-net@vger.kernel.org > Subject: close() with queued receive data loses transmit data > > > This is a long post. Please bear with me. I don't think I'm off my > rocker, though I'm willing to argue the point. > > I'm running RedHat 7.3 with their kernel 2.4.18-18.7x and their > "printconf" printer interface package. Sometimes when printing over > TCP via the "jetdirectprint" perl script that's part of printconf, the > print job data gets truncated and the printer reports an error. I've > collected traces of the application using strace and of the network > traffic using ethereal and I've studied the various rfc's for TCP and > I've convinced myself that the last thing I expected to find is true: > the problem is a defect in the Linux TCP stack. > > The sequence of events is like this: > > - jetdirectprint opens a TCP connection to the printer and begins > sending the data, and the printer begins accepting the data. > No. Source Destination Info > 1 192.168.1.184 192.168.1.2 37853 > jetdirect [SYN] Seq=1214590847 > Ack=0 Win=5840 Len=0 > 2 192.168.1.2 192.168.1.184 jetdirect > 37853 [SYN, ACK] Seq=31547308 > Ack=1214590848 Win=8760 Len=0 > 3 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214590848 > Ack=31547309 Win=5840 Len=0 > 4 192.168.1.184 192.168.1.2 37853 > jetdirect [PSH, ACK] Seq=1214590848 > Ack=31547309 Win=5840 Len=15 > 5 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214590863 > Ack=31547309 Win=5840 Len=1460 > 6 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547309 > Ack=1214592323 Win=7285 Len=0 > 7 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214592323 > Ack=31547309 Win=5840 Len=1460 > > - While processing the data, the printer sends back a few bytes of > status information (packet 164). But jetdirectprint never reads it, > so the status info stays in the TCP receive queue. > 162 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214744868 > Ack=31547309 Win=5840 Len=1460 > 163 192.168.1.184 192.168.1.2 37853 > jetdirect [PSH, ACK] Seq=1214746328 > Ack=31547309 Win=5840 Len=1460 > 164 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547309 > Ack=1214746328 Win=1460 Len=45 > 165 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214747788 > Ack=31547354 Win=5840 Len=0 > 166 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 > Ack=1214747788 Win=0 Len=0 > 167 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 > Ack=1214747788 Win=2920 Len=0 > 168 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1214747788 > Ack=31547354 Win=5840 Len=1460 > > - jetdirectprint finishes writing all the data then while some of the > data is still in the transmit queue waiting to go out, jetdirectprint > closes the socket. (strace this time.) > write(4, "ffffffffffffffffffffffffffffffff"..., 73) = 73 > write(4, "ffffffffffffffffffffffffffffffff"..., 73) = 73 > write(4, "fffefffffffcfafffdfeffffffffffff"..., 43) = 43 > write(4, "grestore\n", 9) = 9 > write(4, " grestore \n", 11) = 11 > write(4, " grestore \n", 11) = 11 > write(4, "pagelevel restore\n", 18) = 18 > write(4, "showpage\n", 9) = 9 > write(4, "%%Trailer\n", 10) = 10 > write(4, "%%Pages: 1\n", 11) = 11 > write(4, "%%EOF\n", 6) = 6 > read(0, "", 4096) = 0 > close(4) = 0 > > - But in tcp_close(), when close() is called on a socket the receive queue > is immediately flushed, and if it wasn't empty a RST is sent with a SEQ > corresponding to the last data sent, the connection is torn down and the > remaining unsent transmit data is lost. > 946 192.168.1.184 192.168.1.2 37853 > jetdirect [PSH, ACK] Seq=1215499688 > Ack=31547354 Win=5840 Len=1460 > 947 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 > Ack=1215501148 Win=2920 Len=0 > 948 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1215501148 > Ack=31547354 Win=5840 Len=1460 > 949 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1215502608 > Ack=31547354 Win=5840 Len=1460 > 950 192.168.1.2 192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 > Ack=1215504068 Win=2920 Len=0 > 951 192.168.1.184 192.168.1.2 37853 > jetdirect [ACK] Seq=1215504068 > Ack=31547354 Win=5840 Len=1460 > 952 192.168.1.184 192.168.1.2 37853 > jetdirect [PSH, ACK] Seq=1215505528 > Ack=31547354 Win=5840 Len=1460 > 953 192.168.1.184 192.168.1.2 37853 > jetdirect [RST, ACK] Seq=1215506988 > Ack=31547354 Win=5840 Len=0 > <no more packets exchanged> > > > I've read rfc793, rfc1122, and rfc2525, and I understand that tcp_close() > is trying to do a half duplex close by sending a RST instead of a FIN when > the application closes the socket when there's queued recieve data. But I > think it's jumping the gun by doing it immediately. It should wait until > the transmit queue is empty before processing the CLOSE and deciding whether > to send a FIN or a RST. There's some basis to support that in rfc793 > section 3.9 regarding CLOSE on an ESTABLISHED connection: > > CLOSE call > ESTABLISHED STATE > Queue this until all preceding SENDs have been > segmentized, then form a FIN segment and send it. > In any case, enter FIN-WAIT-1 state. > > That's saying the CLOSE shouldn't be processed until all preceding SENDs > are done. In this case that would mean neither a FIN nor a RST would be > sent when the CLOSE call was made, then when the transmit queue was > emptied the receive queue would be cleared and a RST would be sent to > indicate to the printer that the status info was lost. > > Comments? > > Jim > > > - > : send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > - > : send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html