On Tue, 2007-04-24 at 11:52 +0300, Constantin Teodorescu wrote: > Hello, I hope I can get a little help from you regarding this kind of > crash ! > > Hardware: > - server, TYAN Tempest i5000VS S5372 BIOS v1.0.4 > - 8 SATA drives Seagate 136 Gb attached on a AIC-9410 controller > - one IDE (boot disk and system) This configuration doesn't work on the vanilla linux kernel ... you need the scsi-aic94xxx-sas-2.6 tree as well for this; is that what you're running with? > - 8 Gb RAM > > Software: > - OpenSUSE 10.2 x86_64 (tried also with SLES 10 but didn't succed in > compiling adp94xx driver from Adaptec) > > Kernels: i tried with any of them : linux-2.6.20.1 , linux-2.6.20.4 , > linux-2.6.20.7 , linux-2.6.21.rc7 > The last one has the 1.0.3 version of aic94xx driver but the results are > the same :-( > > Description: > - the server is running a very heavy loaded PostgreSQL database with > tables spread on those SAS drives, a lot of writes and reads Are these SAS or SATA drives? > - at least 4, 5 times a day I got some warnings in /var/log/messages > (sas: Enter sas_scsi_recover_host , trying to find task XXX ---> > aic94xx: came back from clear nexus) but the system is still working > - more rarely (once per day) I got the following bug in > /var/log/messages and the system is crashed, SAS drivers are not working > anymore, shutdown command is waiting forever, need to hardware reset the > system > > > Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e2c0, task > 0xffff81005bfcb080, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff810047f9dd00, task > 0xffff81007df80cc0, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31180, task > 0xffff8101247ad500, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81021b8af380, task > 0xffff81012e550ac0, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101698c3940, task > 0xffff8101a3b69b80, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865680, task > 0xffff8101a3b69380, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37340, task > 0xffff8101a3b69580, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff810164d31a40, task > 0xffff810058a93dc0, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b940, task > 0xffff81005bfcbc80, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37880, task > 0xffff81015856bd00, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81022fa2f940, task > 0xffff8101d2cf87c0, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100bc25b080, task > 0xffff81005bfcb880, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37dc0, task > 0xffff8101d186a940, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620640, task > 0xffff81010d46a940, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae1c0, task > 0xffff81012e9bf4c0, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff8100531ae380, task > 0xffff8101d186a740, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e8654c0, task > 0xffff8101247ad100, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620480, task > 0xffff81012e5502c0, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81000ce37180, task > 0xffff8101d2cf89c0, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81017d5268c0, task > 0xffff8101d186a540, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff8101c9f5e800, task > 0xffff81015856b900, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81014f8db600, task > 0xffff81007df808c0, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81011e865bc0, task > 0xffff81012e550cc0, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: command 0xffff81009c620100, task > 0xffff8101a3b69980, timed out: EH_NOT_HANDLED > Apr 24 07:22:20 bnd kernel: sas: Enter sas_scsi_recover_host > Apr 24 07:22:20 bnd kernel: sas: trying to find task 0xffff81005bfcb080 > Apr 24 07:22:20 bnd kernel: sas: sas_scsi_find_task: aborting task > 0xffff81005bfcb080 > Apr 24 07:22:25 bnd kernel: aic94xx: tmf timed out > Apr 24 07:22:25 bnd kernel: aic94xx: tmf came back > Apr 24 07:22:25 bnd kernel: aic94xx: task not done, clearing nexus > Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: PRE > Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: POST > Apr 24 07:22:25 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus > posted, waiting... > Apr 24 07:22:30 bnd kernel: aic94xx: asd_clear_nexus_timedout: here > Apr 24 07:22:35 bnd kernel: aic94xx: came back from clear nexus > Apr 24 07:22:35 bnd kernel: aic94xx: task not done, clearing nexus > Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: PRE > Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: POST > Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_index: clear nexus > posted, waiting... > Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete: here > Apr 24 07:22:35 bnd kernel: aic94xx: asd_clear_nexus_tasklet_complete: > opcode: 0x0 > Apr 24 07:22:40 bnd kernel: aic94xx: came back from clear nexus > Apr 24 07:22:40 bnd kernel: ------------[ cut here ]------------ > Apr 24 07:22:40 bnd kernel: kernel BUG at > drivers/scsi/aic94xx/aic94xx_hwi.h:354! This is the attempted free of an in flight command. > Apr 24 07:22:40 bnd kernel: invalid opcode: 0000 [1] SMP > Apr 24 07:22:40 bnd kernel: CPU 0 > Apr 24 07:22:40 bnd kernel: Modules linked in: aic94xx libsas xfs > Apr 24 07:22:40 bnd kernel: Pid: 3504, comm: scsi_eh_0 Not tainted > 2.6.21-rc7_RC7 #1 > Apr 24 07:22:40 bnd kernel: RIP: 0010:[<ffffffff88089f51>] > [<ffffffff88089f51>] :aic94xx:asd_abort_task+0x423/0x54a > Apr 24 07:22:40 bnd kernel: RSP: 0000:ffff81023117fde0 EFLAGS: 00010287 > Apr 24 07:22:40 bnd kernel: RAX: 0000000000000000 RBX: ffff810231618000 > RCX: ffff81022f66a800 > Apr 24 07:22:40 bnd kernel: RDX: ffffffff88089ebf RSI: ffff81005bfcb080 > RDI: ffff81005bfcb098 > Apr 24 07:22:40 bnd kernel: RBP: 0000000000000000 R08: ffff81005bfcb080 > R09: 0000000000000001 > Apr 24 07:22:40 bnd kernel: R10: ffffffff88089ea6 R11: ffff81013ba5bf80 > R12: ffff81005bfcb080 > Apr 24 07:22:40 bnd kernel: R13: ffff810156e4f580 R14: ffff8101d49fb9c0 > R15: ffff81022f66a800 > Apr 24 07:22:40 bnd kernel: FS: 0000000000000000(0000) > GS:ffffffff80712000(0000) knlGS:0000000000000000 > Apr 24 07:22:40 bnd kernel: CS: 0010 DS: 0018 ES: 0018 CR0: > 000000008005003b > Apr 24 07:22:40 bnd kernel: CR2: 00002b110eff3fe8 CR3: 00000001e75f6000 > CR4: 00000000000006e0 > Apr 24 07:22:40 bnd kernel: Process scsi_eh_0 (pid: 3504, threadinfo > ffff81023117e000, task ffff810232274fe0) > Apr 24 07:22:40 bnd kernel: Stack: ffff81023117dac8 00000000c9f5e2c0 > ffff81023117fe50 ffff81005bfcb080 > Apr 24 07:22:40 bnd kernel: 0000000000000000 ffff8101c9f5e2c0 > ffff81005bfcb098 ffffffff88073293 > Apr 24 07:22:40 bnd kernel: ffff810231618010 ffff81023046c000 > ffff8102316181e0 ffff81023046c000 > Apr 24 07:22:40 bnd kernel: Call Trace: > Apr 24 07:22:40 bnd kernel: [<ffffffff88073293>] > :libsas:sas_scsi_recover_host+0x1c2/0x83b > Apr 24 07:22:40 bnd kernel: [<ffffffff8023f7d6>] > keventd_create_kthread+0x0/0x6d > Apr 24 07:22:40 bnd kernel: [<ffffffff80403b26>] > scsi_error_handler+0x6e/0x2d7 > Apr 24 07:22:40 bnd kernel: [<ffffffff80403ab8>] > scsi_error_handler+0x0/0x2d7 > Apr 24 07:22:40 bnd kernel: [<ffffffff8023fa46>] kthread+0xd1/0x103 > Apr 24 07:22:40 bnd kernel: [<ffffffff8020a148>] child_rip+0xa/0x12 > Apr 24 07:22:40 bnd kernel: [<ffffffff8023f7d6>] > keventd_create_kthread+0x0/0x6d > Apr 24 07:22:40 bnd kernel: [<ffffffff8023c327>] run_workqueue+0x10/0x179 > Apr 24 07:22:40 bnd kernel: [<ffffffff8023f975>] kthread+0x0/0x103 > Apr 24 07:22:40 bnd kernel: [<ffffffff8020a13e>] child_rip+0x0/0x12 > Apr 24 07:22:40 bnd kernel: > Apr 24 07:22:40 bnd kernel: > Apr 24 07:22:40 bnd kernel: Code: 0f 0b eb fe 48 8d bb 68 4b 00 00 e8 38 > df 4a f8 41 8b 95 d0 > Apr 24 07:22:40 bnd kernel: RIP [<ffffffff88089f51>] > :aic94xx:asd_abort_task+0x423/0x54a > Apr 24 07:22:40 bnd kernel: RSP <ffff81023117fde0> > James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html