On Thu, 2007-07-05 at 08:25 +0200, Rhythmic Fistman wrote: > On 7/5/07, skaller <skaller@xxxxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, 2007-07-04 at 14:27 -0400, Charlie Brady wrote: > > > On Thu, 5 Jul 2007, skaller wrote: > > > > > > > What you seem to be saying is that sockets are fundamentally > > > > screwed ... > > > > > > I don't think you are the first to suggest that: > > > > > > http://cr.yp.to/tcpip/twofd.html > > > > Sure, but the writer is complaining about needing 'shutdown'. > > I can't get it to work even with 'shutdown'. > > > > So it looks like the Linux kernel is bugged, it's sending > > RST too early? > > John, what you wrote fixes my problems on linux, it almost is a > lingering close. You needed the shutdown, the sleep and to CONTINUE > reading the data, not just one byte. Continuing to read the data for how long? Generally that isn't viable. The web server reads what it needs and stops reading. Perhaps it should shutdown input at that point. > Also, I would change the shutdown to 1 (write only) and the sleep to 2 like > in the apache code. > > Faio_posix::shutdown(s,2); // render socket unusable? > Faio::sleep (Faio::sys_clock,5.0); > var len = 1; var eof = false; var buf = C_hack::malloc(1); > Faio_posix::async_read(s, &len, buf, &eof); > fprint (cerr,q"STREAM:socket $s, eof=$eof\n"); > Faio_posix::shutdown(s,0); > Faio_posix::close s; > > not sure how to write it properly, we don't really have a > "read for n seconds" primitive. True, we don't, because it is nonsense from an application viewpoint. We could read some stuff, put a delay, and read again. If the read hangs its fine, it will be errorred out of the hang by epoll when the socket is actually closed. The problem is, there's no assurance this procedure will actually work. The delay is required to prevent a DNS attack by sending infinite data, but the delay also allows more data to come in. Of course this won't happen with a proper web client.. but it may well happen if the library is used to implement, for example, a chat system. The closer needs to (a) stop reading input and (b) write final output and then (c) close, knowing that the output written in (b) will be sent 'if possible'. If the recipent doesn't read it, or ignores it, that's fine: but if the transmission is destroyed by the close operation it isn't. The close needs to be delayed for a period or until the data is sent, whichever comes first .. and this must be done by the TCP/IP stack because it CANNOT be done by the application. The OS provides buffering, telling the client a write has succeeded when actually it hasn't yet .. then the OS is responsible for at least trying to honor its return code. > Anyway, I also don't think this should be part of the general socket > close... it's a protocol problem. But we have to somehow work around the fact that TCP/IP sockets are bugged ;( > It's nasty having all fthreads blocking often uselessly for n seconds > in close, so we could have a separate lingerer thread that handles all > the sockets that want to be lingeringly closed. That won't work. The average time to close socket must exceed the connection rate. With a connection rate of 500 per second, the timeout for a linear queue would have to be 2 ms per socket. That connection rate is required for an analogue telephone switch. Hmm .. at 100K bytes per second (1M ADSL connection) you'd only be able to send 200 bytes in that time, not nearly enough to clear a 64K buffer. The alternative is a pthread per socket, but that means lots of pthreads which is precisely what our whole system is trying to avoid, since we know it kills most OS schedulers. It's bad enough having a large pool of used sockets, probably Linux can't handle that either.. however for a large number of short connections, that should not be necessary. Again, Linux might not handle a high connection rate. I doubt Windows can. Solaris, or HP/UX might .. but there's no way to find out without a working asynchronous socket library: if we have to block up the system at the application level it becomes pointless to even test it. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net - To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html