Edwin Steiner wrote: > On Wed, Oct 18, 2006 at 03:52:50PM +0200, Robert Schuster wrote: >> Hi all, > > Hi! > >> the title looks like fun, eh? :) > > Not really, if you spend some time with that stuff. ;) > >> In an attempt to get gnu/testlet/java/net/ServerSocket/ReturnOnClose to succeed >> on Cacao with the new an shiny VMChannel implementation I found out the Cacao's >> Thread.interrupt() does not cause blocking system calls to be interrupted. A > > I found an even nastier problem (after debugging the whole day): If > one thread is blocking in a system call on a file descriptor (in my case > it is `accept`), and another thread closes this file descriptor, the > blocking call does not return. > > What's even worse in the case of accept: The same file descriptor may > later be opened by another thread, for example by creating a socket on a > different port. Now an accept on this "new" file descriptor (the same fd > number) is started. Big problem: The _old_ accept call is still running, > and it is a race which of the accept calls will return when a connection > comes in. > > This happens with CACAO if you run mauve tests like this: > > cacao Harness java.net.HttpURLConnection > > The testlet gnu.testlet.java.net.HttpURLConnection.responseCodeTest > creates a server that returns "505" responses. The ServerSocket is > closed at the end of the test, but the following test > (gnu.testlet.java.net.HttpURLConnection.responseHeadersTest) gets the > same fd for its server, and since the old accept is still running, it is > a race about whether the test gets a "505" or a "200" response, > **even though the new server uses a different port**. > > BTW google told me, that we are not the only ones having this problem: > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4344135 > > This is really a nasty case. I fear it requires special infrastructure > for `close` and blocking system calls, so the blocking threads are > interrupted, and the file descriptors get invalidated. > > -Edwin Hi Edwin, Yes. I think I run into this problem in kaffe too. My main problem at that time was with the socket layer (what you are actually precising here). The conclusion was to always use select/poll when you expect to block some time (it happens they are more interruptible than read/write in practice) and to use shutdown for network. Moreover, for the Thread.interrupt call to be successful, you need a special infrastructure to propagate the interrupt signal: you just cannot rely on a UNIX signal to trigger the right error in the blocking syscall (in that case select/poll). In the end, to get a threadsafe accept() syscall; we need crosscheck different information before effectively using the filedescriptor. Guilhem. P.S.: I wonder if it may happen for files over network through NFS for example. Probably most blocking IO calls must be crosschecked.