Re: [2.6.27.25] Hang in SCSI sync cache when a disk is removed--?

Mike Anderson <andmike@xxxxxxxxxxxxxxxxxx> · Thu, 2 Jul 2009 10:41:51 -0700

Paul Smith <paul@xxxxxxxxxxxxxxxxx> wrote:
> Hi all; we are seeing a problem where, when we pull a disk out of our
> disk array (even one that's not actively being used), the entire IO
> subsystem in Linux hangs.  Here are some details:
> 
> I have an IBM Bladecenter with an LSI EXP3000 SAS expander with 12 1TB
> Seagate SAS disks.  Relevant lspci output for the SAS controllers:
> 
>         # lspci | grep LSI
>         02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02)
>         08:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 03)
>         14:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 03)
> 
> On this system we are running an embedded/custom version of Linux in a
> ramdisk, based on Linux 2.6.27.25.  Unfortunately it's quite
> difficult/impossible for us to upgrade to a newer kernel at this time,
> however if this problem rings a bell I'm happy to backport patches,
> fixes, etc.
> 
> As I mentioned, when we pull one of the disks from the EXP3000 the IO
> subsystem completely hangs.  Since we're running on a ramdisk this
> doesn't hang our system completely, but any attempt to do any disk IO
> thereafter hangs, so we have to power-cycle the blade (because reboot
> tries to write to the disks).  This quite reproducible in our
> environment BUT it is very timing-sensitive, as shown below.  If we
> enable too much logging, etc. it goes away.
> 

Have you tried a minimum level of logging like the following without the
error going away?
"sysctl -w dev.scsi.logging_level=4100"

> We've been in touch with some driver folks at LSI and they seem to feel
> that the problem is a SCSI midlayer race condition, rather than in the
> mptlinux driver itself.  So I'm hoping someone here has ideas.
> 
> On a working disk pull we get log messages like this:
> 
>         mptscsih: ioc1: attempting host reset! (sc=ffff8804619e2640)
>         mptscsih: ioc1: host reset: SUCCESS (sc=ffff8804619e2640)
>         mptbase: ioc1: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501)
>         mptsas: ioc1: removing ssp device: fw_channel 0, fw_id 72, phy 11, sas_addr 0x5000c5000d2987b6
>         sd 3:0:11:0: [sdx] Synchronizing SCSI cache
>         sd 3:0:11:0: Device offlined - not ready after error recovery
>         sg_cmd_done: device detached
> 
> Note that the "host reset: SUCCESS" message here comes BEFORE the
> "Synchronizing SCSI cache" message.  On a hanging disk pull we get log
> messages like this:
> 
>         mptscsih: ioc1: attempting host reset! (sc=ffff8804622b48c0)
>         mptsas: ioc1: removing ssp device: fw_channel 0, fw_id 72, phy 11, sas_addr 0x5000c5000d2987b6
>         sd 3:0:11:0: [sdx] Synchronizing SCSI cache
> 
> and it hangs right here.  In this situation the host reset does not
> complete before we try to sync, and that appears to be the indicator of
> the problem.  Here's a backtrace; note we're in sd_sync_cache():
> 
>         Call Trace:
>          [<ffffffff8048d88f>] _spin_lock_irqsave+0x1f/0x50
>          [<ffffffff8048daf2>] _spin_unlock_irqrestore+0x12/0x40
>          [<ffffffffa00080fc>] scsi_get_command+0x8c/0xc0 [scsi_mod]
>          [<ffffffff8048c11d>] schedule_timeout+0xad/0xf0
>          [<ffffffff8034df1d>] elv_next_request+0x15d/0x290
>          [<ffffffff8048b1ea>] wait_for_common+0xba/0x170
>          [<ffffffff80237460>] default_wake_function+0x0/0x10
>          [<ffffffff80353b77>] blk_execute_rq+0x67/0xa0
>          [<ffffffff80350e71>] get_request_wait+0x21/0x1d0
>          [<ffffffff8023e972>] vprintk+0x1f2/0x490
>          [<ffffffff8048dab1>] _spin_unlock_irq+0x11/0x40
>          [<ffffffffa000e5a4>] scsi_execute+0xf4/0x150 [scsi_mod]
>          [<ffffffffa000e691>] scsi_execute_req+0x91/0x100 [scsi_mod]
>          [<ffffffffa00f89bc>] sd_sync_cache+0xac/0x100 [sd_mod]
>          [<ffffffff80360000>] compat_blkdev_ioctl+0x80/0x1740
>          [<ffffffff80364062>] kobject_get+0x12/0x20
>          [<ffffffffa00fac51>] sd_shutdown+0x71/0x160 [sd_mod]
>          [<ffffffffa00fad7c>] sd_remove+0x3c/0x80 [sd_mod]
>          [<ffffffffa0012122>] scsi_bus_remove+0x42/0x60 [scsi_mod]
>          [<ffffffff803d8ba9>] __device_release_driver+0x99/0x100
>          [<ffffffff803d8d08>] device_release_driver+0x28/0x40
>          [<ffffffff803d8087>] bus_remove_device+0xb7/0xf0
>          [<ffffffff803d66c9>] device_del+0x119/0x1a0
>          [<ffffffffa001245c>] __scsi_remove_device+0x5c/0xb0 [scsi_mod]
>          [<ffffffffa00124d8>] scsi_remove_device+0x28/0x40 [scsi_mod]
>          [<ffffffffa00125a0>] __scsi_remove_target+0xa0/0xd0 [scsi_mod]
>          [<ffffffffa0012640>] __remove_child+0x0/0x30 [scsi_mod]
>          [<ffffffffa0012656>] __remove_child+0x16/0x30 [scsi_mod]
>          [<ffffffff803d5c3b>] device_for_each_child+0x3b/0x60
>          [<ffffffffa0012606>] scsi_remove_target+0x36/0x70 [scsi_mod]
>          [<ffffffffa010c5f5>] sas_rphy_remove+0x75/0x80 [scsi_transport_sas]
>          [<ffffffffa010c609>] sas_rphy_delete+0x9/0x20 [scsi_transport_sas]
>          [<ffffffffa010c642>] sas_port_delete+0x22/0x140 [scsi_transport_sas]
>          [<ffffffffa013c230>] mptsas_del_end_device+0x230/0x2c0 [mptsas]
>          [<ffffffffa013c8a1>] mptsas_hotplug_work+0x291/0xb20 [mptsas]
>          [<ffffffff80369c9a>] vsnprintf+0x2ea/0x7c0
>          [<ffffffff80287dac>] free_hot_cold_page+0x1fc/0x2f0
>          [<ffffffff80287ed8>] __pagevec_free+0x38/0x50
>          [<ffffffff8028b730>] release_pages+0x180/0x1d0
>          [<ffffffff80362789>] __next_cpu+0x19/0x30
>          [<ffffffff802321ec>] find_busiest_group+0x1dc/0x960
>          [<ffffffff80362789>] __next_cpu+0x19/0x30
>          [<ffffffff802321ec>] find_busiest_group+0x1dc/0x960
>          [<ffffffffa013e4a9>] mptsas_firmware_event_work+0xd29/0x1110 [mptsas]
>          [<ffffffff8022dc94>] update_curr+0x84/0xd0
>          [<ffffffff80230370>] __dequeue_entity+0x60/0x90
>          [<ffffffff8048dab1>] _spin_unlock_irq+0x11/0x40
>          [<ffffffff802364fb>] finish_task_switch+0x3b/0xd0
>          [<ffffffff8048b911>] thread_return+0xa3/0x662
>          [<ffffffffa013d780>] mptsas_firmware_event_work+0x0/0x1110 [mptsas]
>          [<ffffffff80250e65>] run_workqueue+0x85/0x150
>          [<ffffffff80250fcf>] worker_thread+0x9f/0x110
>          [<ffffffff802553b0>] autoremove_wake_function+0x0/0x30
>          [<ffffffff80250f30>] worker_thread+0x0/0x110
>          [<ffffffff80254ef7>] kthread+0x47/0x90
>          [<ffffffff80254eb0>] kthread+0x0/0x90
>          [<ffffffff8020d5f9>] child_rip+0xa/0x11
>          [<ffffffff80254eb0>] kthread+0x0/0x90
>          [<ffffffff80254eb0>] kthread+0x0/0x90
>          [<ffffffff8020d5ef>] child_rip+0x0/0x11
> 
> According to sd.c:sd_synch_cache() it's supposed to retry the
> scsi_execute_req() three times then give up, but instead it never
> returns.  It seems that if the host reset is not completed yet, then we
> find this event on the workqueue and get into some kind of deadlock
> situation.
> 
> We're kind of stuck on this and I was wondering if anyone has any
> thoughts or avenues to look at to move us forward on resolving this?

Can you run "cat /sys/class/scsi_host/*/state" when you are in the hung
state?

If the host is in recovery no IOs will move forward. I assume if you can
get a run with the 4100 level of logging it will show a host reset sent,
but no waking up host to restart (unless the reset is being generated for
other reasons outside of the scsi error handler).

-andmike
--
Michael Anderson
andmike@xxxxxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html