Re: close() with queued receive data loses transmit data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Huh?  jetdirectprint is perl code that just does
    s = socket()
    connect(s)
    while read(STDIN)
        write(s)
    close(s)
What socket option would you suggest modifying to fix it?

The only workaround I know of at the application layer is to write(), then
shutdown(s, SHUT_WR), then read() until an EOF indicates the other end has
closed the connection, then close() - in other words to explicitly do a full
duplex close.

You're not seriously suggesting that every app that writes to a descriptor
that might sometimes be a socket should be forced to do all that to guarantee
that no data is lost, are you?

Jim

Leigh Wanstead wrote:
I've convinced myself that the last thing I expected to find is true:
the problem is a defect in the Linux TCP stack.

IMHO, it is not a defect in the Linux TCP stack. That is jetdirectprint's
problem.
I don't know how to fix jetdirectprint's problem. I guess that if you got
source code of jetdirectprint. You need to modify socket option to get what
you want.

-----Original Message-----
From: linux-net-owner@vger.kernel.org
[mailto:linux-net-owner@vger.kernel.org]On Behalf Of Jim Castleberry
Sent: Thursday, November 28, 2002 8:59 AM
To: linux-net@vger.kernel.org
Subject: close() with queued receive data loses transmit data


This is a long post.  Please bear with me.  I don't think I'm off my
rocker, though I'm willing to argue the point.

I'm running RedHat 7.3 with their kernel 2.4.18-18.7x and their
"printconf" printer interface package.  Sometimes when printing over
TCP via the "jetdirectprint" perl script that's part of printconf, the
print job data gets truncated and the printer reports an error.  I've
collected traces of the application using strace and of the network
traffic using ethereal and I've studied the various rfc's for TCP and
I've convinced myself that the last thing I expected to find is true:
the problem is a defect in the Linux TCP stack.

The sequence of events is like this:

- jetdirectprint opens a TCP connection to the printer and begins
   sending the data, and the printer begins accepting the data.
  No.   Source      Destination  Info
   1 192.168.1.184 192.168.1.2   37853 > jetdirect [SYN] Seq=1214590847
Ack=0 Win=5840 Len=0
   2 192.168.1.2   192.168.1.184 jetdirect > 37853 [SYN, ACK] Seq=31547308
Ack=1214590848 Win=8760 Len=0
   3 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214590848
Ack=31547309 Win=5840 Len=0
   4 192.168.1.184 192.168.1.2   37853 > jetdirect [PSH, ACK] Seq=1214590848
Ack=31547309 Win=5840 Len=15
   5 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214590863
Ack=31547309 Win=5840 Len=1460
   6 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547309
Ack=1214592323 Win=7285 Len=0
   7 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214592323
Ack=31547309 Win=5840 Len=1460

- While processing the data, the printer sends back a few bytes of
   status information (packet 164).  But jetdirectprint never reads it,
   so the status info stays in the TCP receive queue.
162 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214744868
Ack=31547309 Win=5840 Len=1460
163 192.168.1.184 192.168.1.2   37853 > jetdirect [PSH, ACK] Seq=1214746328
Ack=31547309 Win=5840 Len=1460
164 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547309
Ack=1214746328 Win=1460 Len=45
165 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214747788
Ack=31547354 Win=5840 Len=0
166 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354
Ack=1214747788 Win=0 Len=0
167 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354
Ack=1214747788 Win=2920 Len=0
168 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1214747788
Ack=31547354 Win=5840 Len=1460

- jetdirectprint finishes writing all the data then while some of the
   data is still in the transmit queue waiting to go out, jetdirectprint
   closes the socket.  (strace this time.)
write(4, "ffffffffffffffffffffffffffffffff"..., 73) = 73
write(4, "ffffffffffffffffffffffffffffffff"..., 73) = 73
write(4, "fffefffffffcfafffdfeffffffffffff"..., 43) = 43
write(4, "grestore\n", 9)               = 9
write(4, " grestore \n", 11)            = 11
write(4, " grestore \n", 11)            = 11
write(4, "pagelevel restore\n", 18)     = 18
write(4, "showpage\n", 9)               = 9
write(4, "%%Trailer\n", 10)             = 10
write(4, "%%Pages: 1\n", 11)            = 11
write(4, "%%EOF\n", 6)                  = 6
read(0, "", 4096)                       = 0
close(4)                                = 0

- But in tcp_close(), when close() is called on a socket the receive queue
   is immediately flushed, and if it wasn't empty a RST is sent with a SEQ
   corresponding to the last data sent, the connection is torn down and the
   remaining unsent transmit data is lost.
946 192.168.1.184 192.168.1.2   37853 > jetdirect [PSH, ACK] Seq=1215499688
Ack=31547354 Win=5840 Len=1460
947 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354
Ack=1215501148 Win=2920 Len=0
948 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1215501148
Ack=31547354 Win=5840 Len=1460
949 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1215502608
Ack=31547354 Win=5840 Len=1460
950 192.168.1.2   192.168.1.184 jetdirect > 37853 [ACK] Seq=31547354
Ack=1215504068 Win=2920 Len=0
951 192.168.1.184 192.168.1.2   37853 > jetdirect [ACK] Seq=1215504068
Ack=31547354 Win=5840 Len=1460
952 192.168.1.184 192.168.1.2   37853 > jetdirect [PSH, ACK] Seq=1215505528
Ack=31547354 Win=5840 Len=1460
953 192.168.1.184 192.168.1.2   37853 > jetdirect [RST, ACK] Seq=1215506988
Ack=31547354 Win=5840 Len=0
<no more packets exchanged>


I've read rfc793, rfc1122, and rfc2525, and I understand that tcp_close()
is trying to do a half duplex close by sending a RST instead of a FIN when
the application closes the socket when there's queued recieve data.  But I
think it's jumping the gun by doing it immediately.  It should wait until
the transmit queue is empty before processing the CLOSE and deciding whether
to send a FIN or a RST.  There's some basis to support that in rfc793
section 3.9 regarding CLOSE on an ESTABLISHED connection:

   CLOSE call
     ESTABLISHED STATE
       Queue this until all preceding SENDs have been
       segmentized, then form a FIN segment and send it.
       In any case, enter FIN-WAIT-1 state.

That's saying the CLOSE shouldn't be processed until all preceding SENDs
are done.  In this case that would mean neither a FIN nor a RST would be
sent when the CLOSE call was made, then when the transmit queue was
emptied the receive queue would be cleared and a RST would be sent to
indicate to the printer that the status info was lost.

Comments?

Jim


-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux