On 7/11/23 19:06, Stefano Garzarella wrote:
CCing `./scripts/get_maintainer.pl -f drivers/scsi/virtio_scsi.c`,
since I found a few things in the virtio-scsi driver...
FYI we have seen that Linux has problems with a QEMU patch for the
virtio-scsi device (details at the bottom of this email in the revert
commit message and BZ).
This is what I found when I looked at the Linux code:
In scsi_report_sense() in linux/drivers/scsi/scsi_error.c linux calls
scsi_report_lun_change() that set `sdev_target->expecting_lun_change =
1` when we receive a UNIT ATTENTION with REPORT LUNS CHANGED
(sshdr->asc == 0x3f && sshdr->ascq == 0x0e).
When `sdev_target->expecting_lun_change = 1` is set and we call
scsi_check_sense(), for example to check the next UNIT ATTENTION, it
will return NEEDS_RETRY, that I think will cause the issues we are
seeing.
`sdev_target->expecting_lun_change` is reset only in
scsi_decide_disposition() when `REPORT_LUNS` command returns with
SAM_STAT_GOOD.
That command is issued in scsi_report_lun_scan() called by
__scsi_scan_target(), called for example by scsi_scan_target(),
scsi_scan_host(), etc.
So, checking QEMU, we send VIRTIO_SCSI_EVT_RESET_RESCAN during hotplug
and VIRTIO_SCSI_EVT_RESET_REMOVED during hotunplug. In both cases now we
send also the UNIT ATTENTION.
In the virtio-scsi driver, when we receive VIRTIO_SCSI_EVT_RESET_RESCAN
(hotplug) we call scsi_scan_target() or scsi_add_device(). Both of them
will call __scsi_scan_target() at some points, sending `REPORT_LUNS`
command to the device. This does not happen for
VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug). Indeed if I remove the
UNIT ATTENTION from the hotunplug in QEMU, everything works well.
So, I tried to add a scan also for VIRTIO_SCSI_EVT_RESET_REMOVED:
The point of having the event queue is to avoid expensive scans of the
entire host, so I don't think this is the right thing to do.
On the Linux side, one change we might do is to remove the printk for
adapters that do process hotplug/hotunplug, using a new flag in
scsi_host_template. There are several callers of scsi_add_device() and
scsi_remove_device() in adapter code, so at least these should not issue
the printk:
drivers/scsi/aacraid/commsup.c
drivers/scsi/arcmsr/arcmsr_hba.c
drivers/scsi/esas2r/esas2r_main.c
drivers/scsi/hpsa.c
drivers/scsi/ipr.c
drivers/scsi/megaraid/megaraid_sas_base.c
drivers/scsi/mvumi.c
drivers/scsi/pmcraid.c
drivers/scsi/smartpqi/smartpqi_init.c
drivers/scsi/virtio_scsi.c
drivers/scsi/vmw_pvscsi.c
drivers/scsi/xen-scsifront.c
Paolo
Another thing I noticed is that in QEMU maybe we should set the UNIT
ATTENTION first and then send the event on the virtqueue, because the
scan should happen after the unit attention, but I don't know if in any
case the unit attention is processed before the virtqueue.
I mean something like this:
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 45b95ea070..13db40f4f3 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1079,8 +1079,8 @@ static void virtio_scsi_hotplug(HotplugHandler
*hotplug_dev, DeviceState *dev,
};
virtio_scsi_acquire(s);
- virtio_scsi_push_event(s, &info);
scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
+ virtio_scsi_push_event(s, &info);
virtio_scsi_release(s);
}
}
@@ -1111,8 +1111,8 @@ static void virtio_scsi_hotunplug(HotplugHandler
*hotplug_dev, DeviceState *dev,
if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
virtio_scsi_acquire(s);
- virtio_scsi_push_event(s, &info);
scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
+ virtio_scsi_push_event(s, &info);
virtio_scsi_release(s);
}
}
At this point I think the problem is on the handling of the
VIRTIO_SCSI_EVT_RESET_REMOVED event in the virtio-scsi driver, where
somehow we have to redo the bus scan, but scsi_scan_host() doesn't seem
to be enough when the event rate is very high.
I don't know if along with this fix, we also need to limit the rate in
QEMU somehow.
Sorry for the length of this email, but I'm not familiar with SCSI and
wanted some suggestions on how to proceed.
Paolo, Stefan, Linux SCSI maintainers, any suggestion?
Thanks,
Stefano
On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
That commit causes several problems in Linux as described in the BZ.
In particular, after a while, other devices on the bus are no longer
usable even if those devices are not affected by the hotunplug.
This may be a problem in Linux, but we have not been able to identify
it so far. So better to revert this patch until we find a solution.
Also, Oracle, which initially proposed this patch for a problem with
Solaris, seems to have already reversed it downstream:
https://linux.oracle.com/errata/ELSA-2023-12065.html
Suggested-by: Thomas Huth <thuth@xxxxxxxxxx>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
Cc: qemu-stable@xxxxxxxxxx
Cc: Mark Kanda <mark.kanda@xxxxxxxxxx>
Signed-off-by: Stefano Garzarella <sgarzare@xxxxxxxxxx>
---
include/hw/scsi/scsi.h | 1 -
hw/scsi/scsi-bus.c | 18 ------------------
hw/scsi/virtio-scsi.c | 2 --
3 files changed, 21 deletions(-)
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index e2bb1a2fbf..7c8adf10b1 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -198,7 +198,6 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus
*bus, BlockBackend *blk,
BlockdevOnError rerror,
BlockdevOnError werror,
const char *serial, Error **errp);
-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, SCSIDevice *d,
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index f80f4cb4fc..42a915f8b7 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1617,24 +1617,6 @@ static int scsi_ua_precedence(SCSISense sense)
return (sense.asc << 8) | sense.ascq;
}
-void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
-{
- int prec1, prec2;
- if (sense.key != UNIT_ATTENTION) {
- return;
- }
-
- /*
- * Override a pre-existing unit attention condition, except for a
more
- * important reset condition.
- */
- prec1 = scsi_ua_precedence(bus->unit_attention);
- prec2 = scsi_ua_precedence(sense);
- if (prec2 < prec1) {
- bus->unit_attention = sense;
- }
-}
-
void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
{
int prec1, prec2;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 45b95ea070..1f56607100 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1080,7 +1080,6 @@ static void virtio_scsi_hotplug(HotplugHandler
*hotplug_dev, DeviceState *dev,
virtio_scsi_acquire(s);
virtio_scsi_push_event(s, &info);
- scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
virtio_scsi_release(s);
}
}
@@ -1112,7 +1111,6 @@ static void virtio_scsi_hotunplug(HotplugHandler
*hotplug_dev, DeviceState *dev,
if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
virtio_scsi_acquire(s);
virtio_scsi_push_event(s, &info);
- scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
virtio_scsi_release(s);
}
}
--
2.41.0