[2.6.27.25] Hang in SCSI sync cache when a disk is removed--?

Paul Smith <paul@xxxxxxxxxxxxxxxxx> · Thu, 02 Jul 2009 12:22:52 -0400

Hi all; we are seeing a problem where, when we pull a disk out of our
disk array (even one that's not actively being used), the entire IO
subsystem in Linux hangs.  Here are some details:

I have an IBM Bladecenter with an LSI EXP3000 SAS expander with 12 1TB
Seagate SAS disks.  Relevant lspci output for the SAS controllers:

        # lspci | grep LSI
        02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02)
        08:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 03)
        14:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 03)

On this system we are running an embedded/custom version of Linux in a
ramdisk, based on Linux 2.6.27.25.  Unfortunately it's quite
difficult/impossible for us to upgrade to a newer kernel at this time,
however if this problem rings a bell I'm happy to backport patches,
fixes, etc.

As I mentioned, when we pull one of the disks from the EXP3000 the IO
subsystem completely hangs.  Since we're running on a ramdisk this
doesn't hang our system completely, but any attempt to do any disk IO
thereafter hangs, so we have to power-cycle the blade (because reboot
tries to write to the disks).  This quite reproducible in our
environment BUT it is very timing-sensitive, as shown below.  If we
enable too much logging, etc. it goes away.

We've been in touch with some driver folks at LSI and they seem to feel
that the problem is a SCSI midlayer race condition, rather than in the
mptlinux driver itself.  So I'm hoping someone here has ideas.

On a working disk pull we get log messages like this:

        mptscsih: ioc1: attempting host reset! (sc=ffff8804619e2640)
        mptscsih: ioc1: host reset: SUCCESS (sc=ffff8804619e2640)
        mptbase: ioc1: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501)
        mptsas: ioc1: removing ssp device: fw_channel 0, fw_id 72, phy 11, sas_addr 0x5000c5000d2987b6
        sd 3:0:11:0: [sdx] Synchronizing SCSI cache
        sd 3:0:11:0: Device offlined - not ready after error recovery
        sg_cmd_done: device detached

Note that the "host reset: SUCCESS" message here comes BEFORE the
"Synchronizing SCSI cache" message.  On a hanging disk pull we get log
messages like this:

        mptscsih: ioc1: attempting host reset! (sc=ffff8804622b48c0)
        mptsas: ioc1: removing ssp device: fw_channel 0, fw_id 72, phy 11, sas_addr 0x5000c5000d2987b6
        sd 3:0:11:0: [sdx] Synchronizing SCSI cache

and it hangs right here.  In this situation the host reset does not
complete before we try to sync, and that appears to be the indicator of
the problem.  Here's a backtrace; note we're in sd_sync_cache():

        Call Trace:
         [<ffffffff8048d88f>] _spin_lock_irqsave+0x1f/0x50
         [<ffffffff8048daf2>] _spin_unlock_irqrestore+0x12/0x40
         [<ffffffffa00080fc>] scsi_get_command+0x8c/0xc0 [scsi_mod]
         [<ffffffff8048c11d>] schedule_timeout+0xad/0xf0
         [<ffffffff8034df1d>] elv_next_request+0x15d/0x290
         [<ffffffff8048b1ea>] wait_for_common+0xba/0x170
         [<ffffffff80237460>] default_wake_function+0x0/0x10
         [<ffffffff80353b77>] blk_execute_rq+0x67/0xa0
         [<ffffffff80350e71>] get_request_wait+0x21/0x1d0
         [<ffffffff8023e972>] vprintk+0x1f2/0x490
         [<ffffffff8048dab1>] _spin_unlock_irq+0x11/0x40
         [<ffffffffa000e5a4>] scsi_execute+0xf4/0x150 [scsi_mod]
         [<ffffffffa000e691>] scsi_execute_req+0x91/0x100 [scsi_mod]
         [<ffffffffa00f89bc>] sd_sync_cache+0xac/0x100 [sd_mod]
         [<ffffffff80360000>] compat_blkdev_ioctl+0x80/0x1740
         [<ffffffff80364062>] kobject_get+0x12/0x20
         [<ffffffffa00fac51>] sd_shutdown+0x71/0x160 [sd_mod]
         [<ffffffffa00fad7c>] sd_remove+0x3c/0x80 [sd_mod]
         [<ffffffffa0012122>] scsi_bus_remove+0x42/0x60 [scsi_mod]
         [<ffffffff803d8ba9>] __device_release_driver+0x99/0x100
         [<ffffffff803d8d08>] device_release_driver+0x28/0x40
         [<ffffffff803d8087>] bus_remove_device+0xb7/0xf0
         [<ffffffff803d66c9>] device_del+0x119/0x1a0
         [<ffffffffa001245c>] __scsi_remove_device+0x5c/0xb0 [scsi_mod]
         [<ffffffffa00124d8>] scsi_remove_device+0x28/0x40 [scsi_mod]
         [<ffffffffa00125a0>] __scsi_remove_target+0xa0/0xd0 [scsi_mod]
         [<ffffffffa0012640>] __remove_child+0x0/0x30 [scsi_mod]
         [<ffffffffa0012656>] __remove_child+0x16/0x30 [scsi_mod]
         [<ffffffff803d5c3b>] device_for_each_child+0x3b/0x60
         [<ffffffffa0012606>] scsi_remove_target+0x36/0x70 [scsi_mod]
         [<ffffffffa010c5f5>] sas_rphy_remove+0x75/0x80 [scsi_transport_sas]
         [<ffffffffa010c609>] sas_rphy_delete+0x9/0x20 [scsi_transport_sas]
         [<ffffffffa010c642>] sas_port_delete+0x22/0x140 [scsi_transport_sas]
         [<ffffffffa013c230>] mptsas_del_end_device+0x230/0x2c0 [mptsas]
         [<ffffffffa013c8a1>] mptsas_hotplug_work+0x291/0xb20 [mptsas]
         [<ffffffff80369c9a>] vsnprintf+0x2ea/0x7c0
         [<ffffffff80287dac>] free_hot_cold_page+0x1fc/0x2f0
         [<ffffffff80287ed8>] __pagevec_free+0x38/0x50
         [<ffffffff8028b730>] release_pages+0x180/0x1d0
         [<ffffffff80362789>] __next_cpu+0x19/0x30
         [<ffffffff802321ec>] find_busiest_group+0x1dc/0x960
         [<ffffffff80362789>] __next_cpu+0x19/0x30
         [<ffffffff802321ec>] find_busiest_group+0x1dc/0x960
         [<ffffffffa013e4a9>] mptsas_firmware_event_work+0xd29/0x1110 [mptsas]
         [<ffffffff8022dc94>] update_curr+0x84/0xd0
         [<ffffffff80230370>] __dequeue_entity+0x60/0x90
         [<ffffffff8048dab1>] _spin_unlock_irq+0x11/0x40
         [<ffffffff802364fb>] finish_task_switch+0x3b/0xd0
         [<ffffffff8048b911>] thread_return+0xa3/0x662
         [<ffffffffa013d780>] mptsas_firmware_event_work+0x0/0x1110 [mptsas]
         [<ffffffff80250e65>] run_workqueue+0x85/0x150
         [<ffffffff80250fcf>] worker_thread+0x9f/0x110
         [<ffffffff802553b0>] autoremove_wake_function+0x0/0x30
         [<ffffffff80250f30>] worker_thread+0x0/0x110
         [<ffffffff80254ef7>] kthread+0x47/0x90
         [<ffffffff80254eb0>] kthread+0x0/0x90
         [<ffffffff8020d5f9>] child_rip+0xa/0x11
         [<ffffffff80254eb0>] kthread+0x0/0x90
         [<ffffffff80254eb0>] kthread+0x0/0x90
         [<ffffffff8020d5ef>] child_rip+0x0/0x11

According to sd.c:sd_synch_cache() it's supposed to retry the
scsi_execute_req() three times then give up, but instead it never
returns.  It seems that if the host reset is not completed yet, then we
find this event on the workqueue and get into some kind of deadlock
situation.

We're kind of stuck on this and I was wondering if anyone has any
thoughts or avenues to look at to move us forward on resolving this?

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html