Re: mpt3sas and /sys/block/<dev>/device/timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 6/25/22 01:10, Forza wrote:
Hi,

I am currently facing an issue with a Broadcom HBA 9500-8i SAS controller where 'blkdiscard /dev/sdX' on WD SA500 SATA SSDs cause an IO timeout and device reset.

* LSI/Broadcom HBA 9500-8i SAS/SATA controller
* WD RED SA500 NAS SATA SSD 2TB (WDS200T1R0A-68A4W0)
   Drive FW: 411000WR
* Alpine Linux kernel 5.15.48
* /sys/block/sdf/queue/
   discard_granularity:512
   discard_max_bytes:134217216
   discard_max_hw_bytes:134217216

I simply issue a 'blkdiscard /dev/sdf' and after about 30 seconds the following errors show in dmesg (quite a lot of rows). The full blkdiscard takes between 1.5 and 2.5 minutes depending on the SSD I run on (I have 4 drives). The problem is the same if I run fstrim on a mounted XFS or Btrfs (but not ext4) filesystem on these drives.

[  +0.000003] scsi target6:0:4: handle(0x0029), sas_address(0x5003048020db4543), phy(3) [  +0.000003] scsi target6:0:4: enclosure logical id(0x5003048020db457f), slot(3) [  +0.000003] scsi target6:0:4: enclosure level(0x0000), connector name( C0.1) [  +0.000003] sd 6:0:4:0: No reference found at driver, assuming scmd(0x00000000eb0d9438) might have completed
[  +0.000003] sd 6:0:4:0: task abort: SUCCESS scmd(0x00000000eb0d9438)
[  +0.000012] sd 6:0:4:0: attempting task abort!scmd(0x0000000075f63919), outstanding for 30397 ms & timeout 30000 ms [  +0.000003] sd 6:0:4:0: [sdg] tag#2762 CDB: opcode=0x42 42 00 00 00 00 00 00 00 18 00 [  +0.000002] scsi target6:0:4: handle(0x0029), sas_address(0x5003048020db4543), phy(3) [  +0.000004] scsi target6:0:4: enclosure logical id(0x5003048020db457f), slot(3) [  +0.000002] scsi target6:0:4: enclosure level(0x0000), connector name( C0.1) [  +0.000003] sd 6:0:4:0: No reference found at driver, assuming scmd(0x0000000075f63919) might have completed
[  +0.000003] sd 6:0:4:0: task abort: SUCCESS scmd(0x0000000075f63919)
[  +0.255021] sd 6:0:4:0: Power-on or device reset occurred


Does the mpt3sas driver or the HBA controller not follow the /sys/block/<dev>/device/timeout value? I have mine set to 180 seconds.

It seems that there are many hardcoded timeout values in the driver code.

https://github.com/torvalds/linux/blob/master/drivers/scsi/mpt3sas/mpt3sas_scsih.c https://github.com/torvalds/linux/blob/6a0a17e6c6d1091ada18d43afd87fb26a82a9823/drivers/scsi/mpt3sas/mpt3sas_scsih.c#L3303-L3306

Any thoughts other than trying to avoid using discards/fstrim? I did reach out to Broadcom for support, and they claim it is a fault in the fstrim code and that on FreeBSD they had fixed this. Not sure how relevant that statement is though.

Thanks,
Forza


I used blktrace to capture some events that might help debug where the issues are. I don't know how to interpret the results though.


1) running blkdiscard /dev/sdf1
https://paste.tnonline.net/files/MYymVOfP4ROQ_blktrace.tar.xz

2) running mkfs.btrfs /dev/sdf1
https://paste.tnonline.net/files/5UUvxYlQD46Q_blktrace_mkfs.tar.xz

3) running fstrim on mounted btrfs
https://paste.tnonline.net/files/gultxaa7OP8P_blktrace_fstrim.tar.xz
https://paste.tnonline.net/files/8Vy7D0uR2Mk5_dmesg_fstrim.txt

Example output:
Input file sdf.blktrace.0 added
  8,80   1        1     0.000000000  5527  A   D 2048 + 8388607 <- (8,81) 0
8,80 1 2 0.000002456 5527 Q D 2048 + 8388607 [blkdiscard]
  8,80   1        3     0.000004876  5527  X   D 2048 / 264191 [blkdiscard]
  8,80   1        4     0.000009894  5527  G   D 2048 + 262143 [blkdiscard]
  8,80   1        5     0.000010754  5527  P   N [blkdiscard]
8,80 1 6 0.000012881 5527 X D 264191 / 526334 [blkdiscard] 8,80 1 7 0.000013925 5527 G D 264191 + 262143 [blkdiscard]
  8,80   1        8     0.000014896  5527  U   N [blkdiscard] 1
  8,80   1        9     0.000015890  5527  I   D 2048 + 262143 [blkdiscard]
  8,80   1       10     0.000027270  5527  D   D 2048 + 262143 [blkdiscard]
  8,80   1       11     0.000039543  5527  P   N [blkdiscard]
8,80 1 12 0.000040775 5527 X D 526334 / 788477 [blkdiscard] 8,80 1 13 0.000041572 5527 G D 526334 + 262143 [blkdiscard]
  8,80   1       14     0.000041842  5527  U   N [blkdiscard] 1
8,80 1 15 0.000042175 5527 I D 264191 + 262143 [blkdiscard] 8,80 1 16 0.000044967 5527 D D 264191 + 262143 [blkdiscard]
  8,80   1       17     0.000047802  5527  P   N [blkdiscard]
8,80 1 18 0.000048798 5527 X D 788477 / 1050620 [blkdiscard] 8,80 1 19 0.000049493 5527 G D 788477 + 262143 [blkdiscard]
  8,80   1       20     0.000049715  5527  U   N [blkdiscard] 1
8,80 1 21 0.000050022 5527 I D 526334 + 262143 [blkdiscard] 8,80 1 22 0.000052386 5527 D D 526334 + 262143 [blkdiscard]
  8,80   1       23     0.000055063  5527  P   N [blkdiscard]
...
snip ~100k lines
...
8,80 4 10940 97.593173642 548 D DS 3904898591 + 262143 [kworker/4:1H]
  8,80   4    10941    97.603196002     0  C  DS 3900704303 + 3932145 [0]
8,80 4 10942 97.603246256 548 D DS 3905160734 + 262143 [kworker/4:1H]
  8,80   4    10943    97.613166491     0  C  DS 3900704303 + 4194288 [0]
8,80 4 10944 97.613216413 548 D DS 3905422877 + 262143 [kworker/4:1H]
  8,80   4    10945    97.623198445     0  C  DS 3900704303 + 4456431 [0]
8,80 4 10946 97.623249722 548 D DS 3905685020 + 262143 [kworker/4:1H]
  8,80   4    10947    97.633162688     0  C  DS 3900704303 + 4718574 [0]
8,80 4 10948 97.633212563 548 D DS 3905947163 + 262143 [kworker/4:1H]
  8,80   4    10949    97.643154345     0  C  DS 3900704303 + 4980717 [0]
8,80 4 10950 97.643204045 548 D DS 3906209306 + 262143 [kworker/4:1H]
  8,80   4    10951    97.653270127     0  C  DS 3900704303 + 5242860 [0]
8,80 4 10952 97.653319479 548 D DS 3906471449 + 262143 [kworker/4:1H]
  8,80   4    10953    97.663202719     0  C  DS 3900704303 + 5505003 [0]
8,80 4 10954 97.663253381 548 D DS 3906733592 + 262143 [kworker/4:1H]
  8,80   4    10955    97.673273616     0  C  DS 3900704303 + 5767146 [0]
8,80 4 10956 97.673321526 548 D DS 3906995735 + 33257 [kworker/4:1H]
  8,80   4    10957    97.683204146     0  C  DS 3900704303 + 6029289 [0]
  8,80   4    10958    97.693695724     0  C  DS 3900704303 + 6291432 [0]
  8,80   4    10959    97.703304939     0  C   D 3892315696 + 6553575 [0]
  8,80   4    10960    97.713278422     0  C   D 3867149875 + 6553575 [0]
  8,80   4    10961    97.723309633     0  C   D 3867149875 + 6815718 [0]
  8,80   4    10962    97.733228328     0  C   D 3867149875 + 7077861 [0]
  8,80   4    10963    97.743206422     0  C   D 3867149875 + 7340004 [0]
  8,80   4    10964    97.753131380     0  C   D 3867149875 + 7602147 [0]
  8,80   4    10965    97.763353833     0  C   D 3867149875 + 7864290 [0]
  8,80   4    10966    97.773132921     0  C   D 3867149875 + 8126433 [0]
  8,80   4    10967    97.783267092     0  C   D 3867149875 + 8388576 [0]
  8,80   4    10968    97.784616230     0  C   D 3867149875 + 8388607 [0]
  8,80   4    10969    97.793083064     0  C   D 3875538482 + 262143 [0]
  8,80   4    10970    97.803187466     0  C   D 3875538482 + 524286 [0]
  8,80   4    10971    97.813136517     0  C   D 3875538482 + 786429 [0]
  8,80   4    10972    97.823092905     0  C   D 3875538482 + 1048572 [0]
  8,80   4    10973    97.833401461     0  C   D 3875538482 + 1310715 [0]
  8,80   4    10974    97.843100535     0  C   D 3875538482 + 1572858 [0]
  8,80   4    10975    97.853242242     0  C   D 3875538482 + 1835001 [0]
  8,80   4    10976    97.863173598     0  C   D 3875538482 + 2097144 [0]
  8,80   4    10977    97.873196944     0  C   D 3875538482 + 2359287 [0]
  8,80   4    10978    97.883177261     0  C   D 3875538482 + 2621430 [0]
  8,80   4    10979    97.893191605     0  C   D 3875538482 + 2883573 [0]
  8,80   4    10980    97.903370941     0  C   D 3875538482 + 3145716 [0]
  8,80   4    10981    97.913331748     0  C   D 3875538482 + 3407859 [0]
  8,80   4    10982    97.923181282     0  C   D 3875538482 + 3670002 [0]
  8,80   4    10983    97.933252172     0  C   D 3875538482 + 3932145 [0]
  8,80   4    10984    97.943217689     0  C   D 3875538482 + 4194288 [0]
  8,80   4    10985    97.953153062     0  C   D 3875538482 + 4456431 [0]
  8,80   4    10986    97.963283883     0  C   D 3875538482 + 4718574 [0]
  8,80   4    10987    97.973210464     0  C   D 3892315696 + 6291432 [0]
  8,80   4    10988    97.983769074     0  C  DS 3900704303 + 6324689 [0]

CPU0 (sdf):
 Reads Queued:    , iB   Writes Queued:    , iB
 Read Dispatches: , iB   Write Dispatches: , iB
 Reads Requeued:                 Writes Requeued:
 Reads Completed: , iB   Writes Completed: , iB
 Read Merges:     , iB   Write Merges:     , iB
 Read depth:             2               Write depth:            32
 PC Reads Queued: , iB   PC Writes Queued: , iB
 PC Read Disp.:   , iB   PC Write Disp.:   , iB
 PC Reads Req.:                  PC Writes Req.:
 PC Reads Compl.:                PC Writes Compl.:
 IO unplugs:          2360               Timer unplugs:         348
CPU1 (sdf):
 Reads Queued:    , iB   Writes Queued:    , iB
 Read Dispatches: , iB   Write Dispatches: , iB
 Reads Requeued:                 Writes Requeued:
 Reads Completed: , iB   Writes Completed: , iB
 Read Merges:     , iB   Write Merges:     , iB
 Read depth:             2               Write depth:            32
 IO unplugs:          5500               Timer unplugs:         778
CPU2 (sdf):
 Reads Queued:    , iB   Writes Queued:    , iB
 Read Dispatches: , iB   Write Dispatches: , iB
 Reads Requeued:                 Writes Requeued:
 Reads Completed: , iB   Writes Completed: , iB
 Read Merges:     , iB   Write Merges:     , iB
 Read depth:             2               Write depth:            32
 IO unplugs:          2419               Timer unplugs:         354
CPU3 (sdf):
 Reads Queued:    , iB   Writes Queued:    , iB
 Read Dispatches: , iB   Write Dispatches: , iB
 Reads Requeued:                 Writes Requeued:
 Reads Completed: , iB   Writes Completed: , iB
 Read Merges:     , iB   Write Merges:     , iB
 Read depth:             2               Write depth:            32
 IO unplugs:           977               Timer unplugs:         144
CPU4 (sdf):
 Reads Queued:    , iB   Writes Queued:    , iB
 Read Dispatches: , iB   Write Dispatches: , iB
 Reads Requeued:                 Writes Requeued:
 Reads Completed: , iB   Writes Completed: , iB
 Read Merges:     , iB   Write Merges:     , iB
 Read depth:             2               Write depth:            32
 IO unplugs:          1334               Timer unplugs:         197
CPU5 (sdf):
 Reads Queued:    , iB   Writes Queued:    , iB
 Read Dispatches: , iB   Write Dispatches: , iB
 Reads Requeued:                 Writes Requeued:
 Reads Completed: , iB   Writes Completed: , iB
 Read Merges:     , iB   Write Merges:     , iB
 Read depth:             2               Write depth:            32
 PC Reads Queued: , iB   PC Writes Queued: , iB
 PC Read Disp.:   , iB   PC Write Disp.:   , iB
 PC Reads Req.:                  PC Writes Req.:
 PC Reads Compl.:                PC Writes Compl.:
 IO unplugs:           518               Timer unplugs:          71

Total (sdf):
 Reads Queued:    , iB   Writes Queued:    , iB
 Read Dispatches: , iB   Write Dispatches: , iB
 Reads Requeued:                 Writes Requeued:
 Reads Completed: , iB   Writes Completed: , iB
 Read Merges:     , iB   Write Merges:     , iB
 PC Reads Queued: , iB   PC Writes Queued: , iB
 PC Read Disp.:   , iB   PC Write Disp.:   , iB
 PC Reads Req.:                  PC Writes Req.:
 PC Reads Compl.:                PC Writes Compl.:
 IO unplugs:         13108               Timer unplugs:        1892
Events (sdf): Skips: 0 forward (0 -   0.0%)
Trace started at Sat Jun 25 12:05:11 2022


Thanks,
Forza



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux