I've got an i386 server running centos 4.2 with 3 3ware controllers in it -- an 8006-2 for the system disks and 2 7500-8s. On the 7500s, I'm running an all software RAID50. This morning I came in to find the system hung. Turns out a disk went overnight on one of the 7500s, and rather than a graceful failover I got this: Jan 6 01:03:58 $SERVER kernel: 3w-xxxx: scsi2: Command failed: status = 0xc7,flags = 0x40, unit #3. Jan 6 01:04:02 $SERVER kernel: 3w-xxxx: scsi2: AEN: ERROR: Drive error: Port #3. Jan 6 01:04:10 $SERVER 3w-xxxx[2781]: ERROR: Drive error encountered on port 3 on controller ID:2. Check cables and drives for media errors. (0xa) Jan 6 01:04:10 $SERVER kernel: Debug: sleeping function called from invalid context at include/asm/uaccess.h:556 Jan 6 01:04:10 $SERVER kernel: in_atomic():0[expected: 0], irqs_disabled():1 Jan 6 01:04:10 $SERVER kernel: [<c011fbe9>] __might_sleep+0x7d/0x88 Jan 6 01:04:10 $SERVER kernel: [<f885f056>] tw_ioctl+0x478/0xb07 [3w_xxxx] Jan 6 01:04:10 $SERVER kernel: [<c011fec9>] autoremove_wake_function+0x0/0x2d Jan 6 01:04:10 $SERVER kernel: [<f883f905>] scsi_done+0x0/0x16 [scsi_mod] Jan 6 01:04:10 $SERVER kernel: [<f8860529>] tw_scsi_queue+0x163/0x1f1 [3w_xxxx] Jan 6 01:04:10 $SERVER kernel: [<f883f748>] scsi_dispatch_cmd+0x1e9/0x24f [scsi_mod] Jan 6 01:04:10 $SERVER kernel: [<f884417e>] scsi_request_fn+0x297/0x30d [scsi_mod] Jan 6 01:04:10 $SERVER kernel: [<c0221f68>] __generic_unplug_device+0x2b/0x2d Jan 6 01:04:10 $SERVER kernel: [<c0221f7f>] generic_unplug_device+0x15/0x21 Jan 6 01:04:10 $SERVER kernel: [<c022288d>] blk_execute_rq+0x88/0xb0 Jan 6 01:04:10 $SERVER kernel: [<c022096e>] elv_set_request+0xa/0x17 Jan 6 01:04:10 $SERVER kernel: [<c022251f>] get_request+0x1de/0x1e8 Jan 6 01:04:10 $SERVER kernel: [<c01a97c3>] task_has_capability+0x4a/0x52 Jan 6 01:04:10 $SERVER kernel: [<c0225cd5>] sg_scsi_ioctl+0x2bf/0x3c1 Jan 6 01:04:10 $SERVER kernel: [<c02261aa>] scsi_cmd_ioctl+0x3d3/0x475 Jan 6 01:04:10 $SERVER kernel: [<c014d41b>] handle_mm_fault+0xbd/0x175 Jan 6 01:04:10 $SERVER kernel: [<c011ad67>] do_page_fault+0x1ae/0x5c6 Jan 6 01:04:10 $SERVER kernel: [<c014e4a6>] vma_adjust+0x286/0x2d6 Jan 6 01:04:10 $SERVER kernel: [<f88228ea>] sd_ioctl+0xb3/0xd4 [sd_mod] Jan 6 01:04:10 $SERVER kernel: [<c02246e8>] blkdev_ioctl+0x328/0x334 The 3ware boards are running the latest firmware and I'm using the stock driver in the kernel. I *did* have to upgrade mdadm to 1.12.0, as the stock version doesn't support stacked arrays. 3dmd is running. Any ideas as to what I can do to prevent this in the future? Having the system hang every time a disk dies is, well, less than optimal. Thanks. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University