Re: kernel BUG at kernel/timer.c:407!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andre,

thank you for continued testing. I spent a long while yesterday evening isolating
possible causes. Here are further pointers

1. Can you please tell us which kernel you are using? If it is a very recent
   one, can you please check whether you get the following message in your
   syslog:
                "[...] listen_overflow!"

   If yes (dccp_debug should be turned on), then very likely setting listen(fd, 1)
   instead of listen(fd,0) may remove any strange effects. 
   The reason is a recent change in sk_accept_queue_is_full which causes a different
   treatment of zero-sized listen-accept queues.

2. With the most recent davem-2.6 kernel I was not able to reproduce this bug. It 
   should, after some more thought, really make no difference whether you are using
   loopback (127.0.0.1) or not.

3. I analyzed the reverted patch you identified. There is indeed a loophole (which
   has not become visible so far), hence I will send 
   (a) an update of this patch 
   (b) a second patch to do with timer initialisation of child sockets.
   In particular (a) may help. 

4. It may be worth trying a different application, e.g. 

       http://www.erg.abdn.ac.uk/users/gerrit/dccp/apps/ttcp_dccp.tar.gz     
   in order to find out which combination of system calls triggers the bug condition.
   I managed to get the paraslash application built, but could not figure out how to
   populate the user lists and required configuration files. 
   I don't understand your code fully yet, but with the more recent stack trace I
   was wondering whether this has to do with setting the listen socket non-blocking
   (mark_fd_nonblock), which is done both in sender and receiver.


Again, many thanks for providing detailed information
Gerrit

Quoting Andre Noll:
| The bug remains, but the backtrace is slightly different,
|  see below.
|  
|  > > The BUG is caused via the following chain: 
|  > > 
|  > > 1. dccp_write_xmit(sk, 0) (due to !block)
|  > > 1. dccp_sendmsg
|  > > 2. ccid2_hc_tx_send_packet -> with hctx->ccid2hctx_pipe >= hctx->ccid2hctx_cwnd
|  > >    (see above, pipe=cwnd=1) ==> returns 1
|  > > 3. in dccp_write_xmit(sk, 0):
|  > >    if (!block) {                 /* this is true here */
|  > > 		sk_reset_timer(sk, &dp->dccps_xmit_timer,
|  > >    				msecs_to_jiffies(err)+jiffies)
|  > >    ==> BUG()
|  > > |   <7>dccp_set_state: listen(c1580030) LISTEN     -> CLOSED
|  > > This may be a clue: this socket has not gone past listen state (i.e. not entered server)
|  > 
|  > Yes, the bug happens in para_server just at the moment the first client
|  > connects. No data is transfered to the client. I'll look into the kernel
|  > dccp code a bit this evening as well.
|  
|  Found nothing suspicious. Apparently, dccp_connect() in
|  net/cddp/output.c is never called as this is the only place where
|  dp->dccps_xmit_timer.function is set, and the BUG in kernel/timer.c
|  indicates that this function pointer is NULL.
-
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [IETF DCCP]     [Linux Networking]     [Git]     [Security]     [Linux Assembly]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux