mpt3sas and /sys/block/<dev>/device/timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am currently facing an issue with a Broadcom HBA 9500-8i SAS controller where 'blkdiscard /dev/sdX' on WD SA500 SATA SSDs cause an IO timeout and device reset.

* LSI/Broadcom HBA 9500-8i SAS/SATA controller
* WD RED SA500 NAS SATA SSD 2TB (WDS200T1R0A-68A4W0)
  Drive FW: 411000WR
* Alpine Linux kernel 5.15.48
* /sys/block/sdf/queue/
  discard_granularity:512
  discard_max_bytes:134217216
  discard_max_hw_bytes:134217216

I simply issue a 'blkdiscard /dev/sdf' and after about 30 seconds the following errors show in dmesg (quite a lot of rows). The full blkdiscard takes between 1.5 and 2.5 minutes depending on the SSD I run on (I have 4 drives). The problem is the same if I run fstrim on a mounted XFS or Btrfs (but not ext4) filesystem on these drives.

[ +0.000003] scsi target6:0:4: handle(0x0029), sas_address(0x5003048020db4543), phy(3) [ +0.000003] scsi target6:0:4: enclosure logical id(0x5003048020db457f), slot(3) [ +0.000003] scsi target6:0:4: enclosure level(0x0000), connector name( C0.1) [ +0.000003] sd 6:0:4:0: No reference found at driver, assuming scmd(0x00000000eb0d9438) might have completed
[  +0.000003] sd 6:0:4:0: task abort: SUCCESS scmd(0x00000000eb0d9438)
[ +0.000012] sd 6:0:4:0: attempting task abort!scmd(0x0000000075f63919), outstanding for 30397 ms & timeout 30000 ms [ +0.000003] sd 6:0:4:0: [sdg] tag#2762 CDB: opcode=0x42 42 00 00 00 00 00 00 00 18 00 [ +0.000002] scsi target6:0:4: handle(0x0029), sas_address(0x5003048020db4543), phy(3) [ +0.000004] scsi target6:0:4: enclosure logical id(0x5003048020db457f), slot(3) [ +0.000002] scsi target6:0:4: enclosure level(0x0000), connector name( C0.1) [ +0.000003] sd 6:0:4:0: No reference found at driver, assuming scmd(0x0000000075f63919) might have completed
[  +0.000003] sd 6:0:4:0: task abort: SUCCESS scmd(0x0000000075f63919)
[  +0.255021] sd 6:0:4:0: Power-on or device reset occurred


Does the mpt3sas driver or the HBA controller not follow the /sys/block/<dev>/device/timeout value? I have mine set to 180 seconds.

It seems that there are many hardcoded timeout values in the driver code.

https://github.com/torvalds/linux/blob/master/drivers/scsi/mpt3sas/mpt3sas_scsih.c
https://github.com/torvalds/linux/blob/6a0a17e6c6d1091ada18d43afd87fb26a82a9823/drivers/scsi/mpt3sas/mpt3sas_scsih.c#L3303-L3306

Any thoughts other than trying to avoid using discards/fstrim? I did reach out to Broadcom for support, and they claim it is a fault in the fstrim code and that on FreeBSD they had fixed this. Not sure how relevant that statement is though.

Thanks,
Forza




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux