On Wed, Feb 08, James Smart wrote: > This patch set updates the lpfc driver to revision 8.1.2, which includes James, we have this driver now. Today I got this crash during bootup on a p620 (4 RS64 cpus), it ran -git9 before, now -git11. A quick look at the changes did not show anything related. both cpu1 and cpu3 had a invalid data access at the same time, no idea which one came first. It came up on after a second try. Maybe there is an obvious error in the new code, maybe 303 patches on top of Linus tree are bad. ... Linux version 2.6.16-rc2-git11-20060212084234-ppc64 (geeko@buildhost) (gcc version 4.1.0 20060210 (prerelease) (SUSE Linux)) #1 SMP Sun Feb 12 08:42:34 UTC 2006 ... Kernel command line: root=/dev/md0 xmon=on kdb=on sysrq=1 selinux=0 elevator=cfq splash=silent desktop ... Loading scsi_mod SCSI subsystem initialized Loading sd_mod Loading scsi_transport_spi Loading sym53c8xx sym0: <896> rev 0x7 at pci 0000:01:01.0 irq 35 sym0: No NVRAM, ID 7, Fast-40, SE, parity checking sym0: SCSI BUS has been reset. scsi0 : sym-2.2.2 Vendor: IBM Model: CDRM00203 !K Rev: 1_05 Type: CD-ROM ANSI SCSI revision: 02 target0:0:1: Beginning Domain Validation target0:0:1: asynchronous target0:0:1: FAST-20 SCSI 20.0 MB/s ST (50 ns, offset 15) target0:0:1: Domain Validation skipping write tests target0:0:1: Ending Domain Validation target0:0:2: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 31) Vendor: IBM Model: ST318305LC Rev: C505 Type: Direct-Access ANSI SCSI revision: 03 target0:0:2: tagged command queuing enabled, command queue depth 16. target0:0:2: Beginning Domain Validation target0:0:2: asynchronous target0:0:2: FAST-20 SCSI 20.0 MB/s ST (50 ns, offset 31) target0:0:2: Domain Validation skipping write tests target0:0:2: Ending Domain Validation sr0: scsi-1 drive SCSI device sda: 35548320 512-byte hdwr sectors (18201 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA SCSI device sda: 35548320 512-byte hdwr sectors (18201 MB) sda: Write Protect is off SCSI device sda: drive cache: write through w/ FUA sda: sda1 sda2 sd 0:0:2:0: Attached scsi disk sda Uniform CD-ROM driver Revision: 3.20 scsi_id[1068]: ssr 0:0:1:0: Attached scsi generic sg0 type 5 csi_id: unable tsd 0:0:2:0: Attached scsi generic sg1 type 0 o access parent device of '/block/sda' sym1: <896> rev 0x7 at pci 0000:01:01.1 irq 34 sym1: No NVRAM, ID 7, Fast-40, LVD, parity checking sym1: SCSI BUS has been reset. scsi1 : sym-2.2.2 Loading scsi_transport_fc Loading lpfc Emulex LightPulse Fibre Channel SCSI driver 8.1.2 Copyright(c) 2004-2006 Emulex. All rights reserved. scsi2 : on PCI bus 21 device 08 irq 55 lpfc 0001:21:01.0: 0:1303 Link Up Event x1 received Data: x1 x1 x4 xa9 Vendor: IBM Model: 2105F20 Rev: 1.94 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdb: 12500032 512-byte hdwr sectors (6400 MB) sdb: Write Protect is off SCSI device sdb: drive cache: write back SCSI device sdb: 12500032 512-byte hdwr sectors (6400 MB) sdb: Write Protect is off SCSI device sdb: drive cache: write back sdb: sdb1 sdb2 sdb3 md: raid0 personality registered for level 0 scsi_sid[1186]: d 2:0:0:0: Attached scsi disk sdb csi_id: unable tsd 2:0:0:0: Attached scsi generic sg2 type 0 o access parent Vendor: device of '/blocIk/sdb' WaitingBM for udev to set tle: scsi_id[112]: scsi_id: una ble to access pa rent device of Model: /block/sdb' 2105F20 Rev: 1.94 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdc: 12500032 512-byte hdwr sectors (6400 MB) Decbpuug: 0x3 :s Velecteorpi: 30n0 (gD afutanc Atcicesos)n acat l[cl0e00d 00fr00o3fm bib2envba0] id c o pnct: ecx0t0 a00t0 i00n00c052lu22dce/: l.inreusxc/hpaedg_etmaaps.k+0hx:314/06xc08 l lr: c000000000053adc: .try_to_wake_up+0x4a8/0x51c sp: c00000003fbb3130 msr: a000000000001032 dar: c00001800048acb0 dsisr: 40000000 current = 0xc0000000034d07f0 paca = 0xc00000000048b280 pid = 1167, comm = lpfc_worker_0 enter ? for help [c00000003fbb31b0] c000000000053adc .try_to_wake_up+0x4a8/0x51c [c00000003fbb3290] c000000000051c4c .__wake_up_common+0x68/0xe0 [c00000003fbb3340] c000000000055214 .__wake_up+0x54/0x88 [c00000003fbb33f0] c0000000002daafc .sock_def_readable+0x54/0xa8 [c00000003fbb3480] c00000000030115c .netlink_broadcast+0x344/0x50c [c00000003fbb3560] c0000000001c6378 .kobject_uevent+0x41c/0x4dc [c00000003fbb3650] c000000000251c94 .class_device_add+0x240/0x398 [c00000003fbb3700] c000000000254f74 .attribute_container_add_class_device+0x18/0x50 [c00000003fbb3780] c0000000002556c8 .transport_add_class_device+0x24/0x68 [c00000003fbb3810] c000000000254e3c .attribute_container_device_trigger+0x124/0x1c8 [c00000003fbb38d0] c0000000002555e0 .transport_add_device+0x1c/0x34 [c00000003fbb3950] d000000000052bb0 .fc_rport_create+0x270/0x36c [scsi_transport_fc] [c00000003fbb3a10] d0000000001504c4 .lpfc_nlp_list+0x988/0xaf8 [lpfc] [c00000003fbb3af0] d000000000158f68 .lpfc_cmpl_reglogin_reglogin_issue+0x150/0x174 [lpfc] [c00000003fbb3b80] d000000000157a58 .lpfc_disc_state_machine+0xd4/0x1c8 [lpfc] [c00000003fbb3c20] d000000000151510 .lpfc_mbx_cmpl_reg_login+0x44/0x94 [lpfc] [c00000003fbb3cc0] d000000000144b40 .lpfc_sli_handle_mb_event+0x418/0x5c8 [lpfc] [c00000003fbb3da0] d000000000152550 .lpfc_do_work+0x1c0/0xbf8 [lpfc] [c00000003fbb3ee0] c000000000079250 .kthread+0x128/0x178 [c00000003fbb3f90] c000000000024adc .kernel_thread+0x4c/0x68 3:mon> cpu 0x1: Vector: 300 (Data Access) at [c00000002f806d30] pc: c00000000000ecdc: .validate_sp+0x30/0x88 lr: c00000000000ee14: .show_stack+0xe0/0x1b0 sp: c00000002f806fb0 msr: a000000000001032 dar: c000000600627b20 dsisr: 40000000 current = 0xc0000000fe9587f0 paca = 0xc00000000048ae80 pid = 1205, comm = vol_id ... 3:mon> e cpu 0x3: Vector: 300 (Data Access) at [c00000003fbb2eb0] pc: c00000000005222c: .resched_task+0x34/0xc0 lr: c000000000053adc: .try_to_wake_up+0x4a8/0x51c sp: c00000003fbb3130 msr: a000000000001032 dar: c00001800048acb0 dsisr: 40000000 current = 0xc0000000034d07f0 paca = 0xc00000000048b280 pid = 1167, comm = lpfc_worker_0 3:mon> t [c00000003fbb31b0] c000000000053adc .try_to_wake_up+0x4a8/0x51c [c00000003fbb3290] c000000000051c4c .__wake_up_common+0x68/0xe0 [c00000003fbb3340] c000000000055214 .__wake_up+0x54/0x88 [c00000003fbb33f0] c0000000002daafc .sock_def_readable+0x54/0xa8 [c00000003fbb3480] c00000000030115c .netlink_broadcast+0x344/0x50c [c00000003fbb3560] c0000000001c6378 .kobject_uevent+0x41c/0x4dc [c00000003fbb3650] c000000000251c94 .class_device_add+0x240/0x398 [c00000003fbb3700] c000000000254f74 .attribute_container_add_class_device+0x18/0x50 [c00000003fbb3780] c0000000002556c8 .transport_add_class_device+0x24/0x68 [c00000003fbb3810] c000000000254e3c .attribute_container_device_trigger+0x124/0x1c8 [c00000003fbb38d0] c0000000002555e0 .transport_add_device+0x1c/0x34 [c00000003fbb3950] d000000000052bb0 .fc_rport_create+0x270/0x36c [scsi_transport_fc] [c00000003fbb3a10] d0000000001504c4 .lpfc_nlp_list+0x988/0xaf8 [lpfc] [c00000003fbb3af0] d000000000158f68 .lpfc_cmpl_reglogin_reglogin_issue+0x150/0x174 [lpfc] [c00000003fbb3b80] d000000000157a58 .lpfc_disc_state_machine+0xd4/0x1c8 [lpfc] [c00000003fbb3c20] d000000000151510 .lpfc_mbx_cmpl_reg_login+0x44/0x94 [lpfc] [c00000003fbb3cc0] d000000000144b40 .lpfc_sli_handle_mb_event+0x418/0x5c8 [lpfc] [c00000003fbb3da0] d000000000152550 .lpfc_do_work+0x1c0/0xbf8 [lpfc] [c00000003fbb3ee0] c000000000079250 .kthread+0x128/0x178 [c00000003fbb3f90] c000000000024adc .kernel_thread+0x4c/0x68 3:mon> r R00 = c00000000048ac80 R16 = 0000000000000000 R01 = c00000003fbb3130 R17 = 0000000000000000 R02 = c000000000625be0 R18 = 0000000000000000 R03 = c0000000fe9587f0 R19 = c00000000fdbca08 R04 = c000000004a55140 R20 = 0000000000000000 R05 = 0000000000000000 R21 = 0000000000000001 R06 = c000000004a6cf70 R22 = 0000000000000001 R07 = c000000003393870 R23 = a000000000001032 R08 = c0000000fe9587f0 R24 = 0000000000000003 R09 = c00001800048ac80 R25 = c0000000035a87f0 R10 = c000000004a55850 R26 = 0000000000000001 R11 = c0000000004341c0 R27 = 0000000000000001 R12 = 0000000000000000 R28 = c000000004a54f70 R13 = c00000000048b280 R29 = 0000000000000001 R14 = 0000000000000000 R30 = c0000000004c7e78 R15 = 0000000000000000 R31 = c000000004a550c0 pc = c00000000005222c .resched_task+0x34/0xc0 lr = c000000000053adc .try_to_wake_up+0x4a8/0x51c msr = a000000000001032 cr = 24000088 ctr = c000000000053b50 xer = 0000000020000000 trap = 300 dar = c00001800048acb0 dsisr = 40000000 cpu2 idle. 1:mon> e cpu 0x1: Vector: 300 (Data Access) at [c00000002f806d30] pc: c00000000000ecdc: .validate_sp+0x30/0x88 lr: c00000000000ee14: .show_stack+0xe0/0x1b0 sp: c00000002f806fb0 msr: a000000000001032 dar: c000000600627b20 dsisr: 40000000 current = 0xc0000000fe9587f0 paca = 0xc00000000048ae80 pid = 1205, comm = vol_id 1:mon> t [link register ] c00000000000ee14 .show_stack+0xe0/0x1b0 [c00000002f806fb0] c00000000000ee00 .show_stack+0xcc/0x1b0 (unreliable) [c00000002f807050] c0000000001cb214 ._raw_spin_lock+0x120/0x164 [c00000002f8070e0] c00000000036bf44 ._spin_lock+0x10/0x24 [c00000002f807160] c00000000005440c .scheduler_tick+0xf4/0x3ec [c00000002f807210] c00000000006aa44 .update_process_times+0x7c/0xa8 [c00000002f8072a0] c000000000021100 .timer_interrupt+0x94/0x404 [c00000002f807380] c0000000000034b4 decrementer_common+0xb4/0x100 --- Exception: 901 (Decrementer) at c00000000005cd64 .release_console_sem+0x1c4/0x284 [c00000002f807720] c00000000005d7dc .vprintk+0x330/0x388 [c00000002f807840] c00000000005d86c .printk+0x38/0x48 [c00000002f8078d0] c0000000000524e4 .__might_sleep+0x98/0xf4 [c00000002f807950] c000000000092468 .do_generic_mapping_read+0x1fc/0x4dc [c00000002f807aa0] c00000000009307c .__generic_file_aio_read+0x184/0x22c [c00000002f807b70] c00000000009487c .generic_file_read+0x94/0xcc [c00000002f807cf0] c0000000000c42f0 .vfs_read+0x118/0x1fc [c00000002f807d90] c0000000000c47d0 .sys_read+0x4c/0x8c [c00000002f807e30] c0000000000086f8 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 000000000ff5cc68 SP (ffd46a00) is in userspace 1:mon> r R00 = 0000000600000000 R16 = c00000002f807e08 R01 = c00000002f806fb0 R17 = 00000000000200e3 R02 = c000000000625be0 R18 = c0000000005eaa90 R03 = c00000002f807d90 R19 = c0000000fed52c50 R04 = c0000000fe9587f0 R20 = 0000000000000000 R05 = 00000000000002f0 R21 = c00000002f807b10 R06 = c00000002f806fe8 R22 = 00000000000200e4 R07 = 0000000000080000 R23 = 0000000000000800 R08 = 0000000000002b02 R24 = c000000000629f50 R09 = c000000000627b20 R25 = 0000000000000001 R10 = 0000000000000000 R26 = c00000002f807e30 R11 = c00000002f804000 R27 = 000000000000000f R12 = 0000000000000020 R28 = c00000000005cd60 R13 = c00000000048ae80 R29 = c00000002f807d90 R14 = 00000000000200e3 R30 = c0000000fe9587f0 R15 = c00000003fa1c508 R31 = c0000000000c47d0 pc = c00000000000ecdc .validate_sp+0x30/0x88 lr = c00000000000ee14 .show_stack+0xe0/0x1b0 msr = a000000000001032 cr = 88022444 ctr = 0000000000000000 xer = 0000000000000000 trap = 300 dar = c000000600627b20 dsisr = 40000000 cpu0 idle looking into dmesg, its not clear anymore if lpfc is at fault: <6>md: raid0 personality registered for level 0 <5>sd 2:0:0:0: Attached scsi disk sdb <5>sd 2:0:0:0: Attached scsi generic sg2 type 0 <5> Vendor: IBM Model: 2105F20 Rev: 1.94 <5> Type: Direct-Access ANSI SCSI revision: 03 <5>SCSI device sdc: 12500032 512-byte hdwr sectors (6400 MB) <3>Debug: sleeping function called from invalid context at include/linux/pagemap.h:168 <1>Unable to handle kernel paging request for data at address 0xc00001800048acb0 <1>Faulting instruction address: 0xc00000000005222c <0>BUG: spinlock lockup on CPU#1, vol_id/1205, c000000004a550c0 <4>Call Trace: <4>[C00000002F806FB0] [C00000000000ED9C] .show_stack+0x68/0x1b0 (unreliable) <4>[C00000002F807050] [C0000000001CB214] ._raw_spin_lock+0x120/0x164 <4>[C00000002F8070E0] [C00000000036BF44] ._spin_lock+0x10/0x24 <4>[C00000002F807160] [C00000000005440C] .scheduler_tick+0xf4/0x3ec <4>[C00000002F807210] [C00000000006AA44] .update_process_times+0x7c/0xa8 <4>[C00000002F8072A0] [C000000000021100] .timer_interrupt+0x94/0x404 <4>[C00000002F807380] [C0000000000034B4] decrementer_common+0xb4/0x100 <4>--- Exception: 901 at .release_console_sem+0x1c4/0x284 <4> LR = .release_console_sem+0x1c0/0x284 <4>[C00000002F807720] [C00000000005D7DC] .vprintk+0x330/0x388 <4>[C00000002F807840] [C00000000005D86C] .printk+0x38/0x48 <4>[C00000002F8078D0] [C0000000000524E4] .__might_sleep+0x98/0xf4 <4>[C00000002F807950] [C000000000092468] .do_generic_mapping_read+0x1fc/0x4dc <4>[C00000002F807AA0] [C00000000009307C] .__generic_file_aio_read+0x184/0x22c <4>[C00000002F807B70] [C00000000009487C] .generic_file_read+0x94/0xcc <4>[C00000002F807CF0] [C0000000000C42F0] .vfs_read+0x118/0x1fc <4>[C00000002F807D90] [C0000000000C47D0] .sys_read+0x4c/0x8c <1>Unable to handle kernel paging request for data at address 0xc000000600627b20 <1>Faulting instruction address: 0xc00000000000ecdc 0:mon> -- short story of a lazy sysadmin: alias appserv=wotan - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html