On 09:20, Gerrit Renker wrote: > thank you for continued testing. I spent a long while yesterday evening isolating > possible causes. Here are further pointers > > 1. Can you please tell us which kernel you are using? I checked 2.6.19.x (which works), 2.6.20, 2.6.20.1 and 2.6.21-rc1 (which all have the bug). 2.6.21-rc2 wouldn't compile on my system due to some apic isssue. > If it is a very recent > one, can you please check whether you get the following message in your > syslog: > "[...] listen_overflow!" > > If yes (dccp_debug should be turned on), then very likely setting listen(fd, 1) > instead of listen(fd,0) may remove any strange effects. > The reason is a recent change in sk_accept_queue_is_full which causes a different > treatment of zero-sized listen-accept queues. Will test this evening and report tomorrow. > 2. With the most recent davem-2.6 kernel I was not able to reproduce this bug. It > should, after some more thought, really make no difference whether you are using > loopback (127.0.0.1) or not. I can try this kernel as well. I'm currently downloading git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.git Hope, that's the right one. > 3. I analyzed the reverted patch you identified. There is indeed a loophole (which > has not become visible so far), hence I will send > (a) an update of this patch > (b) a second patch to do with timer initialisation of child sockets. > In particular (a) may help. OK. Are these patches against the current linus-tree? > 4. It may be worth trying a different application, e.g. > > http://www.erg.abdn.ac.uk/users/gerrit/dccp/apps/ttcp_dccp.tar.gz > in order to find out which combination of system calls triggers the bug condition. Will test and report. > I managed to get the paraslash application built, but could not figure out how to > populate the user lists and required configuration files. That's explained in the INSTALL file. But you don't need any configuration files and I _think_ you don't even need a paraslash user to reproduce the bug, an empty ~/.paraslash/server.users should do. Just start para_server with the autoplay (-a) option, i.e. para_server -a --random_dir=/some/dir/containing/an/mp3/file Then para_recv -r dccp triggers the bug. > I don't understand your code fully yet, but with the more recent stack trace I > was wondering whether this has to do with setting the listen socket non-blocking > (mark_fd_nonblock), which is done both in sender and receiver. IMHO it's considered good practice to set all fds which are used for select() to non-blocking mode. AFAIR the reason is the situation where a network packet arrives but is discarded because of a checksum error. In this case it might happen that select() indicates readability of an fd, but a subsequent read() blocks nevertheless. Maybe it's unneccesssary to set an fd to non-blocking mode if it is only used for writing. But it won't hurt either, so.. Thanks Andre -- The only person who always got his work done by Friday was Robinson Crusoe
Attachment:
signature.asc
Description: Digital signature