On Wed, Jul 12, 2023 at 12:14 PM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote: > > On Wed, Jul 12, 2023 at 10:06:56AM +0200, Paolo Bonzini wrote: > >On 7/11/23 22:21, Mike Christie wrote: > >>What was the issue you are seeing? > >> > >>Was it something like you get the UA. We retry then on one of the > >>retries the sense is not setup correctly, so the scsi error handler > >>runs? That fails and the device goes offline? > >> > >>If you turn on scsi debugging you would see: > >> > >> > >>[ 335.445922] sd 0:0:0:0: [sda] tag#15 Add. Sense: Reported luns data has changed > >>[ 335.445922] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>[ 335.445925] sd 0:0:0:0: [sda] tag#16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>[ 335.445929] sd 0:0:0:0: [sda] tag#17 Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s > >>[ 335.445932] sd 0:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 00 db 4f c0 00 00 20 00 > >>[ 335.445934] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>[ 335.445936] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>[ 335.445938] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>[ 335.445940] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>[ 335.445942] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>[ 335.445945] sd 0:0:0:0: [sda] tag#17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>[ 335.451447] scsi host0: scsi_eh_0: waking up 0/2/2 > >>[ 335.451453] scsi host0: Total of 2 commands on 1 devices require eh work > >>[ 335.451457] sd 0:0:0:0: [sda] tag#16 scsi_eh_0: requesting sense > > > >Does this log come from internal discussions within Oracle? > > > >>I don't know the qemu scsi code well, but I scanned the code for my co-worker > >>and my guess was commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2 had a race in it. > >> > >>How is locking done? when it is a bus level UA but there are multiple devices > >>on the bus? > > > >No locking should be necessary, the code is single threaded. However, > >what can happen is that two consecutive calls to > >virtio_scsi_handle_cmd_req_prepare use the unit attention ReqOps, and > >then the second virtio_scsi_handle_cmd_req_submit finds no unit > >attention (see the loop in virtio_scsi_handle_cmd_vq). That can > >definitely explain the log above. > > Yes, this seems to be the case! > Thank you both for the help! > > Following Paolo's advice, I'm preparing a series for QEMU to solve the > problem! Series posted here: https://lore.kernel.org/qemu-devel/20230712134352.118655-1-sgarzare@xxxxxxxxxx/ Stefano