close() with queued receive data loses transmit data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a long post.  Please bear with me.  I don't think I'm off my
rocker, though I'm willing to argue the point.

I'm running RedHat 7.3 with their kernel 2.4.18-18.7x and their
"printconf" printer interface package.  Sometimes when printing over
TCP via the "jetdirectprint" perl script that's part of printconf, the
print job data gets truncated and the printer reports an error.  I've
collected traces of the application using strace and of the network
traffic using ethereal and I've studied the various rfc's for TCP and
I've convinced myself that the last thing I expected to find is true:
the problem is a defect in the Linux TCP stack.

The sequence of events is like this:

- jetdirectprint opens a TCP connection to the printer and begins
  sending the data, and the printer begins accepting the data.
 No.   Source      Destination  Info
  1 192.168.1.184 192.168.1.2   37853 > jetdirect [SYN] Seq=1214590847 Ack=0 Win=5840 Len=0
  2 192.168.1.2   192.168.1.184 jetdirect > 37853 [SYN, ACK] Seq=31547308 Ack=1214590848 Win=8760 Len=0
  3 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214590848 Ack=31547309 Win=5840 Len=0
  4 192.168.1.184 192.168.1.2   37853 > jetdirect [PSH, ACK] Seq=1214590848 Ack=31547309 Win=5840 Len=15
  5 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214590863 Ack=31547309 Win=5840 Len=1460
  6 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547309 Ack=1214592323 Win=7285 Len=0
  7 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214592323 Ack=31547309 Win=5840 Len=1460

- While processing the data, the printer sends back a few bytes of
  status information (packet 164).  But jetdirectprint never reads it,
  so the status info stays in the TCP receive queue.
162 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214744868 Ack=31547309 Win=5840 Len=1460
163 192.168.1.184 192.168.1.2   37853 > jetdirect [PSH, ACK] Seq=1214746328 Ack=31547309 Win=5840 Len=1460
164 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547309 Ack=1214746328 Win=1460 Len=45
165 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214747788 Ack=31547354 Win=5840 Len=0
166 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 Ack=1214747788 Win=0 Len=0
167 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 Ack=1214747788 Win=2920 Len=0
168 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214747788 Ack=31547354 Win=5840 Len=1460

- jetdirectprint finishes writing all the data then while some of the
  data is still in the transmit queue waiting to go out, jetdirectprint
  closes the socket.  (strace this time.)
write(4, "ffffffffffffffffffffffffffffffff"..., 73) = 73
write(4, "ffffffffffffffffffffffffffffffff"..., 73) = 73
write(4, "fffefffffffcfafffdfeffffffffffff"..., 43) = 43
write(4, "grestore\n", 9)               = 9
write(4, " grestore \n", 11)            = 11
write(4, " grestore \n", 11)            = 11
write(4, "pagelevel restore\n", 18)     = 18
write(4, "showpage\n", 9)               = 9
write(4, "%%Trailer\n", 10)             = 10
write(4, "%%Pages: 1\n", 11)            = 11
write(4, "%%EOF\n", 6)                  = 6
read(0, "", 4096)                       = 0
close(4)                                = 0

- But in tcp_close(), when close() is called on a socket the receive queue
  is immediately flushed, and if it wasn't empty a RST is sent with a SEQ
  corresponding to the last data sent, the connection is torn down and the
  remaining unsent transmit data is lost.
946 192.168.1.184 192.168.1.2   37853 > jetdirect [PSH, ACK] Seq=1215499688 Ack=31547354 Win=5840 Len=1460
947 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 Ack=1215501148 Win=2920 Len=0
948 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1215501148 Ack=31547354 Win=5840 Len=1460
949 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1215502608 Ack=31547354 Win=5840 Len=1460
950 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354 Ack=1215504068 Win=2920 Len=0
951 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1215504068 Ack=31547354 Win=5840 Len=1460
952 192.168.1.184 192.168.1.2   37853 > jetdirect [PSH, ACK] Seq=1215505528 Ack=31547354 Win=5840 Len=1460
953 192.168.1.184 192.168.1.2   37853 > jetdirect [RST, ACK] Seq=1215506988 Ack=31547354 Win=5840 Len=0
<no more packets exchanged>


I've read rfc793, rfc1122, and rfc2525, and I understand that tcp_close()
is trying to do a half duplex close by sending a RST instead of a FIN when
the application closes the socket when there's queued recieve data.  But I
think it's jumping the gun by doing it immediately.  It should wait until
the transmit queue is empty before processing the CLOSE and deciding whether
to send a FIN or a RST.  There's some basis to support that in rfc793
section 3.9 regarding CLOSE on an ESTABLISHED connection:

  CLOSE call
    ESTABLISHED STATE
      Queue this until all preceding SENDs have been
      segmentized, then form a FIN segment and send it.
      In any case, enter FIN-WAIT-1 state.

That's saying the CLOSE shouldn't be processed until all preceding SENDs
are done.  In this case that would mean neither a FIN nor a RST would be
sent when the CLOSE call was made, then when the transmit queue was
emptied the receive queue would be cleared and a RST would be sent to
indicate to the printer that the status info was lost.

Comments?

Jim


-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux