Re: Signals, sockets and threads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Edwin Steiner wrote:
> On Wed, Oct 18, 2006 at 03:52:50PM +0200, Robert Schuster wrote:
>> Hi all,
> 
> Hi!
> 
>> the title looks like fun, eh? :)
> 
> Not really, if you spend some time with that stuff. ;)
> 
>> In an attempt to get gnu/testlet/java/net/ServerSocket/ReturnOnClose to succeed
>> on Cacao with the new an shiny VMChannel implementation I found out the Cacao's
>> Thread.interrupt() does not cause blocking system calls to be interrupted. A
> 
> I found an even nastier problem (after debugging the whole day): If
> one thread is blocking in a system call on a file descriptor (in my case
> it is `accept`), and another thread closes this file descriptor, the
> blocking call does not return.
> 
> What's even worse in the case of accept: The same file descriptor may
> later be opened by another thread, for example by creating a socket on a
> different port. Now an accept on this "new" file descriptor (the same fd
> number) is started. Big problem: The _old_ accept call is still running,
> and it is a race which of the accept calls will return when a connection
> comes in.
> 
> This happens with CACAO if you run mauve tests like this:
> 
>     cacao Harness java.net.HttpURLConnection
> 
> The testlet gnu.testlet.java.net.HttpURLConnection.responseCodeTest
> creates a server that returns "505" responses. The ServerSocket is
> closed at the end of the test, but the following test
> (gnu.testlet.java.net.HttpURLConnection.responseHeadersTest) gets the
> same fd for its server, and since the old accept is still running, it is
> a race about whether the test gets a "505" or a "200" response,
> **even though the new server uses a different port**.
> 
> BTW google told me, that we are not the only ones having this problem:
> 
>     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4344135
> 
> This is really a nasty case. I fear it requires special infrastructure
> for `close` and blocking system calls, so the blocking threads are
> interrupted, and the file descriptors get invalidated.
> 
> -Edwin


Hi Edwin,

Yes. I think I run into this problem in kaffe too. My main problem at
that time was with the socket layer (what you are actually precising
here). The conclusion was to always use select/poll when you expect to
block some time (it happens they are more interruptible than read/write
in practice) and to use shutdown for network. Moreover, for the
Thread.interrupt call to be successful, you need a special
infrastructure to propagate the interrupt signal: you just cannot rely
on a UNIX signal to trigger the right error in the blocking syscall (in
that case select/poll).

In the end, to get a threadsafe accept() syscall; we need crosscheck
different information before effectively using the filedescriptor.


Guilhem.

P.S.: I wonder if it may happen for files over network through NFS for
example. Probably most blocking IO calls must be crosschecked.


[Index of Archives]     [Linux Kernel]     [Linux Cryptography]     [Fedora]     [Fedora Directory]     [Red Hat Development]

  Powered by Linux