All devices on host blocked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a large JBOD attached to my server via an LSI SAS2308 PCI card(mpt2sas driver). I've got about 40 drives right now assembled into 4 Linux software RAID sets and I am using those RAID volumes as back end devices for GPFS. 
Everything was working fine about a week ago when I had 20 drives and 2 RAID volumes then I added 20 new disks, all the same model, and now I am frequently seeing all the devices behind the SAS card reporting device_blocked immediately followed by device_unblocked. These events are correlated with a period of many seconds of no data throughput. This is happening often enough to cause major throughput problems. I have seen similar problem in the past, but they were accompanied by some kind of disk specific error and I could fix the situation by removing the disk. In this case there are no other errors in any log besides the device_blocked and device_unblocked on every single device.
This system is not in production yet so I can blow it all away if I need to, but I really want to understand what is causing this so that if it does come back once we go into production I'll be able to fix it without major disruptions. I suspect there is a misbehaving drive, but there is nothing pointing to a single drive and I could be completely wrong about that. Does anybody have any clue where to look?

Here is what the error logs look like:

Jun 11 19:29:17 storage003 kernel: sd 6:0:0:0: device_blocked, handle(0x0016)
Jun 11 19:29:17 storage003 kernel: sd 6:0:1:0: device_blocked, handle(0x000b)
Jun 11 19:29:17 storage003 kernel: sd 6:0:2:0: device_blocked, handle(0x000c)
Jun 11 19:29:17 storage003 kernel: ses 6:0:3:0: device_blocked, handle(0x000e)
Jun 11 19:29:17 storage003 kernel: sd 6:0:4:0: device_blocked, handle(0x000f)
Jun 11 19:29:17 storage003 kernel: sd 6:0:5:0: device_blocked, handle(0x0010)
... Same thing for the rest of the devices on host6
Jun 11 19:29:18 storage003 kernel: sd 6:0:0:0: device_unblocked and set to running, handle(0x0016)
Jun 11 19:29:18 storage003 kernel: sd 6:0:1:0: device_unblocked and set to running, handle(0x000b)
Jun 11 19:29:18 storage003 kernel: sd 6:0:2:0: device_unblocked and set to running, handle(0x000c)
Jun 11 19:29:18 storage003 kernel: ses 6:0:3:0: device_unblocked and set to running, handle(0x000e)
Jun 11 19:29:18 storage003 kernel: sd 6:0:4:0: device_unblocked and set to running, handle(0x000f)
Jun 11 19:29:18 storage003 kernel: sd 6:0:5:0: device_unblocked and set to running, handle(0x0010)
... Same thing for the rest of the devices again.

Thanks,
Mike Robbert

Attachment: smime.p7s
Description: S/MIME cryptographic signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux