Re: [Bug 6009] tcpdump causes kernel panic

Andrew Morton <akpm@xxxxxxxx> · Sat, 4 Feb 2006 21:04:31 -0800

bugme-daemon@xxxxxxxxxxxxxxxxxxx wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6009
> 
> 
> 
> 
> 
> ------- Additional Comments From djekels@xxxxxxxxxxxxxx  2006-02-04 18:42 -------
> James,
> 
> Initially, the kernel panic when we run our multicast C++ application for 
> about 20 minutes before the panic  occured. What I accidentally descovered was 
> that if I run tcpdump the panic occures much faster, about 2 minutes from 
> start to the panic. 
> I upgraded the firmware on the 3ware SATA array controller and the device 
> driver, 3w-xxx.ko, per instruction of 3ware developers. 
> Even with this new firmware I get same results. 
> 

No, there's no kernel panic here.

What we have is two things:

a) A kernel _warning_, telling us that we're doing illegal things from
   softirq context in the scsi stack.

   This is a known bug.  It's possible that the _probability_ of this
   happening is increased when there's a lot of network traffic happening,
   because that causes more softirq activity.

b) The 3ware driver is shitting itself:

messages.4:Jan  6 12:15:41 chilsp010 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x0053): Battery capacity test is overdue:.
messages.4:Jan  6 12:15:41 chilsp010 kernel: scsi0 : 3ware 9000 Storage Controller
messages.4:Jan  6 12:15:41 chilsp010 kernel: 3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xfeaffc00, IRQ: 217.
messages.4:Jan  6 12:15:41 chilsp010 kernel: 3w-9xxx: scsi0: Firmware FE9X 2.06.00.009, BIOS BE9X 2.03.01.051, Ports: 8.
messages.4:Jan  6 12:15:41 chilsp010 kernel:   Vendor: AMCC      Model: 9500S-8    DISK   Rev: 2.06
messages.4:Jan  6 12:15:41 chilsp010 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 03
messages.4:Jan  6 12:15:41 chilsp010 kernel: SCSI device sda: 390602752 512-byte hdwr sectors (199989 MB)
messages.4:Jan  6 12:15:41 chilsp010 kernel: SCSI device sda: drive cache: write back, no read (daft)
messages.4:Jan  6 12:15:41 chilsp010 kernel:  sda: sda1 sda2
messages.4:Jan  6 12:15:41 chilsp010 kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
messages.4:Jan  6 12:15:41 chilsp010 kernel: scsi: On host 0 channel 0 id 0 only 511 (max_scsi_report_luns) of 493425154 luns reported, try increasing max_scsi_report_luns.
messages.4:Jan  6 12:15:41 chilsp010 kernel: scsi: host 0 channel 0 id 0 lun 0xb0b800008ed88ec0 has a LUN larger than currently supported.
messages.4:Jan  6 12:15:41 chilsp010 kernel: scsi: host 0 channel 0 id 0 lun 0xfbbe007cbf0006b9 has a LUN larger than currently supported.
messages.4:Jan  6 12:15:41 chilsp010 kernel: scsi: host 0 channel 0 id 0 lun 0x0002f3a4ea210600 has a LUN larger than currently supported.
messages.4:Jan  6 12:15:41 chilsp010 kernel: scsi: host 0 channel 0 id 0 lun 0x00bebe073804750b has a LUN larger than currently supported.
messages.4:Jan  6 12:15:41 chilsp010 kernel: scsi: host 0 channel 0 id 0 lun 0x83c61081fefe0775 has a LUN larger than currently supported.

I don't know why b) is happening.

Can you please confirm that the occurrence of b) is increased if there's a
tcpdump happening?  I don't believe that's the case, because b) happened at
boot.

In other words, we have two coompletely unrelated bugs.   Do you agree?
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html