On 7/11/23 22:21, Mike Christie wrote:
What was the issue you are seeing? Was it something like you get the UA. We retry then on one of the retries the sense is not setup correctly, so the scsi error handler runs? That fails and the device goes offline? If you turn on scsi debugging you would see: [ 335.445922] sd 0:0:0:0: [sda] tag#15 Add. Sense: Reported luns data has changed [ 335.445922] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 335.445925] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 335.445929] sd 0:0:0:0: [sda] tag#17 Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s [ 335.445932] sd 0:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 00 db 4f c0 00 00 20 00 [ 335.445934] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 335.445936] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 335.445938] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 335.445940] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 335.445942] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 335.445945] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 335.451447] scsi host0: scsi_eh_0: waking up 0/2/2 [ 335.451453] scsi host0: Total of 2 commands on 1 devices require eh work [ 335.451457] sd 0:0:0:0: [sda] tag#16 scsi_eh_0: requesting sense
Does this log come from internal discussions within Oracle?
I don't know the qemu scsi code well, but I scanned the code for my co-worker and my guess was commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2 had a race in it. How is locking done? when it is a bus level UA but there are multiple devices on the bus?
No locking should be necessary, the code is single threaded. However, what can happen is that two consecutive calls to virtio_scsi_handle_cmd_req_prepare use the unit attention ReqOps, and then the second virtio_scsi_handle_cmd_req_submit finds no unit attention (see the loop in virtio_scsi_handle_cmd_vq). That can definitely explain the log above.
Paolo