Andi Kleen wrote: > > > Incase you can not see this, it appears that the addresses > > of each skb are alright, so the assumption is that the skb > > passed to tcp_fragment() has been unlinked while > > tcp_fragment() was doing its thing. This implies a need for > > locking at some higher level and we don't know enough about > > the tcp code to divine where this might best be done. > > 2.4 TCP should in theory already have enough locking to prevent this > (the socket lock that is aquired by timers and user context socket users) > > -Andi Here is another oops, not quite the same, AND with an assert failure ahead of it. I append the whole report and some and some observations: We had two more panics over the weekend. Here is the analysis from one of them. ---------comments from Dave Howell-------------- Looking at the sysint4l dump, some observations: - Panic was due to an Oops (Null pointer dereference kernel incident) - Full system configuration is in kernel startup logs (memory, disks, chipsets, etc) - Last part of kernel log has oops info, follows kernel assertion failed warning: <4>KERNRL: assertion (atomic_read(&sk->wmem_alloc) == 0) failed at af_inet.c <============== (174):inet_sock_destruct <1>Unable to handle kernel NULL pointer dereference at virtual address 00000049 <4> printing eip: <4>c0255196 <1>*pde = 00000000 <4>Oops: 0000 <4>CPU: 1 <4>EIP: 0010:[<c0255196>] Not tainted <4>EFLAGS: 00010213 <4>eax: c6ace4c8 ebx: 00000000 ecx: 00000004 edx: 00000000 <4>esi: c6ace538 edi: c6ace460 ebp: 00000026 esp: c1219eb4 <4>ds: 0018 es: 0018 ss: 0018 <4>Process swapper (pid: 0, stackpage=c1219000) <4>Stack: c6ace460 c6ace538 c6ace460 004ec3ef c025de3e c6ace460 00000000 c72011a0 <4> c1218050 004ec2d2 c02395b2 c6ace460 c6ace538 c1218000 004ec3ef c025e056 <4> c6ace460 c1218000 00000046 004ebfe7 00000000 c1218000 00cf70a0 c0128eaa <4>Call Trace: [<c025de3e>] [<c02395b2>] [<c025e056>] [<c0128eaa>] [<c025df70>] <4> [<c0128fad>] [<c012483b>] [<c0109704>] [<c0105490>] [<c0105490>] [<c0105490>] <4> [<c01054bc>] [<c0105542>] [<c011d51b>] [<c011d8ad>] <4> <4>Code: 0f b6 4b 49 45 f6 c1 82 74 0c 31 d2 89 96 78 01 00 00 0f b6 - Finally at the bottom of the trace the active backtrace, a bit suspect because it's on the interrupt side (not trace but process it's attributed to). =========================== STACK TRACE OF FAILING TASK =========================== ================================================================ STACK TRACE FOR TASK: 0xc1218000 (swapper) 0 tcp_enter_loss+198 [0xc0255196] 1 tcp_retransmit_timer+473 [0xc025de39] 2 tcp_write_timer+225 [0xc025e051] 3 timer_bh+710 [0xc0128ea6] 4 timer_softirq+40 [0xc0128fa8] 5 do_softirq+185 [0xc0124839] 6 do_IRQ+511 [0xc01096ff] 7 do_IRQ+511 [0xc01096ff] TRACE ERROR 0x1 ================================================================ - In comparison with previous dump looks like the same upstream event occured, with a timer bottom half running and invoking the tcp_retransmit_timer. Last one caught it oopsing in the tcp_fragment code, this is a bit different but the upstream path there is the same. - Same pile of unknown symbol references bringing up dump manually in lcrash, must be corrupt or wrong system.0 or kerntypes.0. Needs a look. - Dumped tcp_enter_loss+0 to tcp_enter_loss+200 to see site at tcp_enter_loss+198. Code at this site is: movzbl 0x49(%ebx),%ecx %ebx is NULL at this point (see above), hence the oops at 00000049. Code for function is in net/ipv4/tcp_input.c starting at line 987. - The failure is in the loop starting at line 1002: for_retrans_queue(skb, sk, tp) { cnt++; if (TCP_SKB_CB(skb)->sacked&TCPCB_RETRANS) tp->undo_marker = 0; TCP_SKB_CB(skb)->sacked &= (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED; if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED) || how) { TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED; TCP_SKB_CB(skb)->sacked |= TCPCB_LOST; tp->lost_out++; } else { tp->sacked_out++; tp->fackets_out = cnt; } } I didn't fully map the code but think that the expansion of: if (TCP_SKB_CB(skb)->sacked&TCPCB_RETRANS) is where the zeroed pointer is used. Looks like the intent is that skp is the iterater variable to loop through the retrans_queue and it got the zero value set on some iteration, not the first. So my guess is a corrupted queue element pointer being picked up and used. - I still would look upstream at the timer bottom half invocation as in both of the dumps this upstream trace is present, and it seems like an exception path for a timeout that leads to a retransmit. - Also needs a look is the kernel assertion that failed and likely led to the oops, looks a lot like an allocation failed and returned a NULL value, this would be my top culprit to pursue. Code from af_net.c at line 174: void inet_sock_destruct(struct sock *sk) { __skb_queue_purge(&sk->receive_queue); __skb_queue_purge(&sk->error_queue); if (sk->type == SOCK_STREAM && sk->state != TCP_CLOSE) { printk("Attempt to release TCP socket in state %d %p\n", sk->state, sk); return; } if (!sk->dead) { printk("Attempt to release alive inet socket %p\n", sk); return; } BUG_TRAP(atomic_read(&sk->rmem_alloc) == 0); BUG_TRAP(atomic_read(&sk->wmem_alloc) == 0); <<-- assert reported here BUG_TRAP(sk->wmem_queued == 0); BUG_TRAP(sk->forward_alloc == 0); if (sk->protinfo.af_inet.opt) kfree(sk->protinfo.af_inet.opt); Continuing on after this likely led to the oops that killed us. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html