On 14/04/2023 08:36, Shin'ichiro Kawasaki wrote:
Hello Bart, Recently, I built a new blktests trial environment on QEMU. With this environment, I observe scsi/007 failure. FYI, let me share blktests output [1] and kernel message [2].
I did not notice which kernel you are using - did you mention it somewhere?
I found the failure depends on kernel configs for debug such as KASAN. When I enable KASAN, the test case fails. When I disable KASAN, the test case passes. It looks that the failure depends on the slow kernel (and/or slow machine). The test case sets 1 second to the block layer timeout to trigger the SCSI error handler. It also sets 3 seconds to scsi_debug delay assuming the error handler completes before the 3 seconds. From the kernel message, it looks that the error handler takes longer than the 3 seconds delay, so I/O completes as success before the error handler completion. This I/O success is not expected then the test case fails. As a trial, I extended the scsi_debug delay time to 10 seconds, then I observed the test case passes. Do you expect the I/O success by slow SCSI error handler? If so, the test case needs improvement by extending the scsi_debug delay time.
The failure may be due to one of my changes. Please see https://lore.kernel.org/lkml/5bdbfbbc-bac1-84a1-5f50-33a443e3292a@xxxxxxxxxx/
[1] blktests output scsi/007 (Trigger the SCSI error handler) [failed] runtime 25.594s ... 13.646s --- tests/scsi/007.out 2023-04-06 10:11:07.926670528 +0900 +++ /home/shin/Blktests/blktests/results/nodev/scsi/007.out.bad 2023-04-14 16:09:45.447892078 +0900 @@ -1,3 +1,3 @@ Running scsi/007 -Reading from scsi_debug failed +Reading from scsi_debug succeeded Test complete [2] kernel message [ 3714.407999] run blktests scsi/007 at 2023-04-14 16:09:31 [ 3714.523102] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1) [ 3714.525023] scsi host3: scsi_debug: version 0191 [20210520] dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0 [ 3714.533733] scsi 3:0:0:0: Direct-Access Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 3714.543198] sd 3:0:0:0: Power-on or device reset occurred [ 3714.543250] sd 3:0:0:0: Attached scsi generic sg2 type 0 [ 3714.550936] sd 3:0:0:0: [sdc] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB) [ 3714.554821] sd 3:0:0:0: [sdc] Write Protect is off [ 3714.558024] sd 3:0:0:0: [sdc] Mode Sense: 73 00 10 08 [ 3714.562414] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 3714.566601] sd 3:0:0:0: [sdc] Preferred minimum I/O size 512 bytes [ 3714.570045] sd 3:0:0:0: [sdc] Optimal transfer size 524288 bytes [ 3714.586397] sd 3:0:0:0: [sdc] Attached SCSI disk [ 3715.999917] sd 3:0:0:0: [sdc] tag#103 abort scheduled [ 3716.015174] sd 3:0:0:0: [sdc] tag#103 aborting command [ 3716.019935] sd 3:0:0:0: [sdc] tag#103 retry aborted command [ 3717.090803] sd 3:0:0:0: [sdc] tag#178 previous abort failed [ 3717.098780] scsi host3: Waking error handler thread [ 3717.098917] scsi host3: scsi_eh_3: waking up 0/1/1 [ 3717.106279] scsi host3: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1 [ 3717.111212] scsi host3: Total of 1 commands on 1 devices require eh work [ 3717.116170] sd 3:0:0:0: scsi_eh_3: Sending BDR [ 3717.120493] sd 3:0:0:0: [sdc] tag#178 scsi_eh_done result: 2 [ 3717.125183] sd 3:0:0:0: [sdc] tag#178 scsi_send_eh_cmnd timeleft: 10000 [ 3717.130301] sd 3:0:0:0: Power-on or device reset occurred [ 3717.134935] sd 3:0:0:0: [sdc] tag#178 scsi_send_eh_cmnd: scsi_eh_completed_normally 2001 [ 3717.140241] sd 3:0:0:0: [sdc] tag#178 scsi_eh_tur return: 2001 [ 3719.033180] sd 3:0:0:0: [sdc] tag#178 scsi_eh_done result: 0 [ 3719.037656] sd 3:0:0:0: [sdc] tag#178 scsi_send_eh_cmnd timeleft: 8114 [ 3719.043293] sd 3:0:0:0: [sdc] tag#178 scsi_send_eh_cmnd: scsi_eh_completed_normally 2002 [ 3719.049631] sd 3:0:0:0: [sdc] tag#178 scsi_eh_tur return: 2002 [ 3719.055489] sd 3:0:0:0: [sdc] tag#178 scsi_eh_3: flush retry cmd [ 3719.061420] scsi host3: waking up host to restart [ 3719.069512] scsi host3: scsi_eh_3: sleeping [ 3720.175944] sd 3:0:0:0: [sdc] tag#180 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=1s [ 3720.182709] sd 3:0:0:0: [sdc] tag#180 CDB: Read(10) 28 00 00 00 3f 80 00 00 08 00 [ 3720.189825] I/O error, dev sdc, sector 16256 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 [ 3722.342015] sd 3:0:0:0: Power-on or device reset occurred [ 3725.349863] sd 3:0:0:0: Power-on or device reset occurred [ 3727.275069] sd 3:0:0:0: [sdc] tag#79 Medium access timeout failure. Offlining disk! [ 3727.279944] sd 3:0:0:0: [sdc] tag#79 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=7s [ 3727.285131] sd 3:0:0:0: [sdc] tag#79 CDB: Read(10) 28 00 00 00 3f 80 00 00 08 00 [ 3727.289961] I/O error, dev sdc, sector 16256 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 3727.294862] Buffer I/O error on dev sdc, logical block 2032, async page read [ 3727.996161] sd 3:0:0:0: [sdc] Synchronizing SCSI cache