Re: socket close

skaller <skaller@xxxxxxxxxxxxxxxxxxxxx> · Thu, 05 Jul 2007 17:29:58 +1000

On Thu, 2007-07-05 at 08:25 +0200, Rhythmic Fistman wrote:
> On 7/5/07, skaller <skaller@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > On Wed, 2007-07-04 at 14:27 -0400, Charlie Brady wrote:
> > > On Thu, 5 Jul 2007, skaller wrote:
> > >
> > > > What you seem to be saying is that sockets are fundamentally
> > > > screwed ...
> > >
> > > I don't think you are the first to suggest that:
> > >
> > > http://cr.yp.to/tcpip/twofd.html
> >
> > Sure, but the writer is complaining about needing 'shutdown'.
> > I can't get it to work even with 'shutdown'.
> >
> > So it looks like the Linux kernel is bugged, it's sending
> > RST too early?
> 
> John, what you wrote fixes my problems on linux, it almost is a
> lingering close. You needed the shutdown, the sleep and to CONTINUE
> reading the data, not just one byte.

Continuing to read the data for how long? Generally that isn't
viable. The web server reads what it needs and stops reading.
Perhaps it should shutdown input at that point.

> Also, I would change the shutdown to 1 (write only) and the sleep to 2 like
> in the apache code.
> 
>       Faio_posix::shutdown(s,2); // render socket unusable?
>       Faio::sleep (Faio::sys_clock,5.0);
>       var len = 1; var eof = false; var buf = C_hack::malloc(1);
>       Faio_posix::async_read(s, &len, buf, &eof);
>       fprint (cerr,q"STREAM:socket $s, eof=$eof\n");
>       Faio_posix::shutdown(s,0);
>       Faio_posix::close s;
> 
> not sure how to write it properly, we don't really have a
> "read for n seconds" primitive.

True, we don't, because it is nonsense from an application
viewpoint. We could read some stuff, put a delay, and read
again. If the read hangs its fine, it will be errorred 
out of the hang by epoll when the socket is actually closed.

The problem is, there's no assurance this procedure will
actually work. The delay is required to prevent a DNS attack
by sending infinite data, but the delay also allows more
data to come in.

Of course this won't happen with a proper web client..
but it may well happen if the library is used to implement,
for example, a chat system.

The closer needs to (a) stop reading input and (b) write
final output and then (c) close, knowing that the output
written in (b) will be sent 'if possible'. If the recipent
doesn't read it, or ignores it, that's fine: but if the
transmission is destroyed by the close operation it isn't.

The close needs to be delayed for a period or until
the data is sent, whichever comes first .. and this
must be done by the TCP/IP stack because it CANNOT be
done by the application. The OS provides buffering,
telling the client a write has succeeded when actually
it hasn't yet .. then the OS is responsible for at least
trying to honor its return code.

> Anyway, I also don't think this should be part of the general socket
> close... it's a protocol problem.

But we have to somehow work around the fact that TCP/IP
sockets are bugged ;(

> It's nasty having all fthreads blocking often uselessly for n seconds
> in close, so we could have a separate lingerer thread that handles all
> the sockets that want to be lingeringly closed.

That won't work. The average time to close socket must exceed
the connection rate. With a connection rate of 500 per
second, the timeout for a linear queue would have to be
2 ms per socket. That connection rate is required for 
an analogue telephone switch. Hmm .. at 100K bytes per second
(1M ADSL connection) you'd only be able to send 200 bytes
in that time, not nearly enough to clear a 64K buffer.

The alternative is a pthread per socket, but that means
lots of pthreads which is precisely what our whole
system is trying to avoid, since we know it kills
most OS schedulers.

It's bad enough having a large pool of used sockets, 
probably Linux can't handle that either.. however
for a large number of short connections, that should
not be necessary. Again, Linux might not handle a
high connection rate. I doubt Windows can. 
Solaris, or HP/UX might .. but there's no way to find
out without a working asynchronous socket library:
if we have to block up the system at the application
level it becomes pointless to even test it.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html