Re: mvsas escalated kernel crash and ata mapping mvsas driver question

Jelle de Jong <jelledejong@xxxxxxxxxxxxx> · Thu, 26 Jan 2017 12:53:45 +0100

Beste Jack,

I set the queue_depth to 1 and timeout to 300 for all SATA disk 
connected to the mvsas controller [ARC-1300ix-16].

Does this mean that ata21 is mapped to /dev/sdq!

root@sweeney:~# dmesg | grep ata21 | grep device
[    4.788568] sas: ata21: end_device-0:0:26: dev error handler

root@sweeney:~# lsscsi -v | grep end_device-0:0:26
  dir: /sys/bus/scsi/devices/0:0:14:0 
[/sys/devices/pci0000:00/0000:00:1c.0/0000:08:00.0/host0/port-0:0/expander-0:0/port-0:0:26/end_device-0:0:26/target0:0:14/0:0:14:0]

root@sweeney:~# lsscsi -v | grep 0:0:14:0
[0:0:14:0]   disk    ATA      WDC WD1003FBYX-0 1V01  /dev/sdq
  dir: /sys/bus/scsi/devices/0:0:14:0 
[/sys/devices/pci0000:00/0000:00:1c.0/0000:08:00.0/host0/port-0:0/expander-0:0/port-0:0:26/end_device-0:0:26/target0:0:14/0:0:14:0]

I added the following to my rc.local

vim /etc/rc.local

for disk in sd{c..r}; do
    echo deadline > /sys/block/$disk/queue/scheduler
    echo 0 > /sys/block/$disk/queue/iosched/front_merges
    echo 150 > /sys/block/$disk/queue/iosched/read_expire
    echo 1500 > /sys/block/$disk/queue/iosched/write_expire
    echo 1 > /sys/block/$disk/device/queue_depth;
    echo 300 > /sys/block/$disk/device/timeout;
done

I hope the performance impact of queue_depth = 1 is not to much....

Kind regards,

Jelle de Jong

On 26/01/17 11:17, Jack Wang wrote:
2017-01-26 10:51 GMT+01:00 Jelle de Jong <jelledejong@xxxxxxxxxxxxx>:
Hello everybody,

I got a server that seemingly random gets kernel crashes, due to an
escalation of events from most likely the mvsas based disk controller.

The harddisk should be okay, I replaced a whole bunch to be sure, but the
server does not get stable. I can not seem to figure out how to map for
example ata21.00 to an disk so I can do a deep badblock check.

I have to complete boot with kernel crash log saved with additional
information.

http://paste.debian.net/plainh/96325d89

Can somebody take a look and maybe help?

08:00.0 SCSI storage controller: Areca Technology Corp. ARC-1300ix-16
16-Port PCI-Express to SAS Non-RAID Host Adapter (rev 02)

Kind regards,

Jelle de Jong

Your IO error seems related to NCQ, have you tried to disable NCQ?

echo 1 > /sys/block/sdX/device/queue_depth

Maybe try 4.10-rc5 is also a option?

Regards,
Jack

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html