Sorry it has taken me so long to get back to this....
I've tested these patches with both my tape drive and with scsi_debug tape emulation.
see:
https://github.com/johnmeneghini/tape_tests
All hardware tests are passing and everything is working as expected with the tape drive tests, but the power on reset behavior
of the scsi_debug test is still showing the some strangeness.
https://github.com/johnmeneghini/tape_tests/blob/master/tape_reset_debug.sh
Specifically, every time you reload the scsi_debug driver the SCSI mid layer clears the POR UA. If I am not mistaken, your
intention with adding the counters for ua_new_media_ctr and ua_por_ctr to the mid layer was to catch these events and report
them to the upper layer driver.
Here's what the scsi_debug test does:
[tape_tests]# ./tape_reset_debug.sh 1 3 1 1
modprobe -r scsi_debug
modprobe scsi_debug tur_ms_to_ready=10000 ptype=1 max_luns=1 dev_size_mb=10000
[Wed Dec 11 15:35:48 2024] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
[Wed Dec 11 15:35:48 2024] scsi host8: scsi_debug: version 0191 [20210520]
dev_size_mb=10000, opts=0x0, submit_queues=1, statistics=0
[Wed Dec 11 15:35:48 2024] scsi 8:0:0:0: Sequential-Access Linux scsi_debug 0191 PQ: 0 ANSI: 7
[Wed Dec 11 15:35:48 2024] scsi 8:0:0:0: Power-on or device reset occurred
^^^^^^^^^^^^^^^^^^^^^^
Here's where the scsi layer is clearing the POR UA.
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: Attached scsi tape st1
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: st1: try direct i/o: yes (alignment 4 B)
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: Attached scsi generic sg3 type 1
[0:0:0:0] disk ATA Samsung SSD 840 4B0Q /dev/sda 3500253855022021d /dev/sg0
[7:0:0:0] tape QUANTUM ULTRIUM 4 U53F /dev/st0 - /dev/sg1
[7:0:1:0] enclosu LSI virtualSES 02 - - /dev/sg2
[8:0:0:0] tape Linux scsi_debug 0191 /dev/st1 - /dev/sg3
[N:0:0:1] disk INTEL SSDPEDMW400G4__1 /dev/nvme0n1 -
Check the status
mt -f /dev/nst1 status
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 0/0 ready 0
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (10000):
IM_REP_EN
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] Error: 2, cmd: 0 0 0 0 0 0
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] Sense Key : Not Ready [current]
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] Add. Sense: Logical unit is in process of becoming ready
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 0 was_reset 0/0 ready 0, result 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
st_chk_result was run here... but it looks like scsi_get_ua_por_ctr(STp->device) didn't report the first POR UA.
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] check_tape: 1141: CHKRES_NOT_READY pos_unknown 0 was_reset 0/0 ready 1
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] flush_buffer: 852: pos_unknown 0 was_reset 0/0 ready 1
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] st_flush: 1404: pos_unknown 0 was_reset 0/0 ready 1
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] flush_buffer: 852: pos_unknown 0 was_reset 0/0 ready 1
Sleeping for 20 seconds
Check the status
mt -f /dev/nst1 status
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 0/0 ready 1
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] Error: 402, cmd: 5 0 0 0 0 0
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] Add. Sense: Invalid command operation code
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 0 was_reset 0/0 ready 0, result 1026
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] Can't read block limits.
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (1010000):
ONLINE IM_REP_EN
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A second mt status command is done after the tape is ready...
So it looks like the initial POR UA is never recorded in ua_por_ctr.
Following this, resetting the device works.
sg_reset --target /dev/nst1
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 0/0 ready 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Error: 402, cmd: 5 0 0 0 0 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Add. Sense: Invalid command operation code
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 0 was_reset 0/0 ready 0, result 1026
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Can't read block limits.
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Error: 402, cmd: 1a 0 0 0 c 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Add. Sense: Invalid field in cdb
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 0 was_reset 0/0 ready 0, result 1026
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scsi mid layer has NOT set was_reset.
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] No Mode Sense.
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Block size: 0, buffer size: 4096 (1 blocks).
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] check_tape: 1282: CHKRES_READY pos_unknown 0 was_reset 0/0 ready 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] flush_buffer: 852: pos_unknown 0 was_reset 0/0 ready 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] st_flush: 1404: pos_unknown 0 was_reset 1/1 ready 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We see the scsi mid layer sets was_reset here.
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] flush_buffer: 852: pos_unknown 0 was_reset 1/1 ready 0
Sleeping for 5 seconds
Check the status
mt -f /dev/nst1 status
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 1/1 ready 0
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: Power-on or device reset occurred
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Power on/reset recognized.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here's your code : scsi_get_ua_por_ctr(STp->device) found the reset here.
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Error: 2, cmd: 0 0 0 0 0 0
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Sense Key : Unit Attention [current]
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Add. Sense: Scsi bus reset occurred
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 1 was_reset 1/1 ready 0, result 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Position unknown is now set.
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Error: 402, cmd: 5 0 0 0 0 0
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Add. Sense: Invalid command operation code
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 1 was_reset 1/1 ready 0, result 1026
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Can't read block limits.
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Error: 402, cmd: 1a 0 0 0 c 0
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
SCSI 2 tape drive:
All in all this is no different from what we have w/out your patches, so I have no problem approving this change.
One thing that did get fixed by this patch series is I can now use the sg device to reset the scsi_debug driver.
This is a good improvement. I can now run my new script.
https://github.com/johnmeneghini/tape_tests/blob/master/tape_reset_debug_sg.sh
sg_reset --target /dev/sg3
Sleeping for 5 seconds
Check the status
mt -f /dev/nst1 status
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 1/1 ready 0
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: Power-on or device reset occurred
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Power on/reset recognized.
This didn't work before your change.
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Error: 2, cmd: 0 0 0 0 0 0
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Sense Key : Unit Attention [current]
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Add. Sense: Scsi bus reset occurred
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 1 was_reset 1/1 ready 0, result 2
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Error: 402, cmd: 5 0 0 0 0 0
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Add. Sense: Invalid command operation code
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 1 was_reset 1/1 ready 0, result 1026
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Can't read block limits.
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (1010000):
ONLINE IM_REP_EN
So all in all I think this is an improvement... I'd like to ask Martin to merge these changes in v6.13.
/John
P.S. There are still many issues with the scsi_debug tape emulation. See my test results for more information about how the
scsi_debug tape emulation test are failing at:
https://bugzilla.kernel.org/show_bug.cgi?id=219419#c21
On 11/25/24 09:02, Kai Mäkisara wrote:
This set applies to 6.12 + the three patches accepted earlier (and in
linux-next).
The first patch re-applies after device reset some settings changed
by the user (partition, density, block size). This is the same as in v1.
The second and third patch address the case where more than one ULD
access the same device. The Unit Attention (UA) sense data is sent only
to one ULD and the others miss it. The st driver needs to find out
if device reset or media change has happened.
The second patch adds counters for New Media and Power On/Reset (POR)
Unit Attentions to the scsi_device struct. The third one changes st
so that these are used: if the value in the scsi_device struct does
not match the one stored locally, the corresponding UA has happened.
Use of the was_reset flag has been removed.
The fourth patch adds a file to sysfs to tell the user if reads/writes
to a tape have been blocked following a device reset.
---
Changes since V1:
- replace the patch removing was_reset handling with patches two and three
- add sysfs file reset_blocked
Kai Mäkisara (4):
scsi: st: Restore some drive settings after reset
scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT
ATTENTIONs
scsi: st: Modify st.c to use the new scsi_error counters
scsi: st: Add sysfs file reset_blocked
Documentation/scsi/st.rst | 5 +++
drivers/scsi/scsi_error.c | 12 +++++++
drivers/scsi/st.c | 73 +++++++++++++++++++++++++++++++++-----
drivers/scsi/st.h | 6 ++++
include/scsi/scsi_device.h | 9 +++++
5 files changed, 97 insertions(+), 8 deletions(-)