Hi Michael, On 2/24/25 8:31 PM, Michael S. Tsirkin wrote: > On Mon, Feb 24, 2025 at 08:49:09AM +0100, Eric Auger wrote: >> Hi Michael, >> >> On 2/21/25 12:42 AM, Michael S. Tsirkin wrote: >>> Hongyu reported a hang on kexec in a VM. QEMU reported invalid memory >>> accesses during the hang. >>> >>> Invalid read at addr 0x102877002, size 2, region '(null)', reason: rejected >>> Invalid write at addr 0x102877A44, size 2, region '(null)', reason: rejected >>> ... >>> >>> It was traced down to virtio-console. Kexec works fine if virtio-console >>> is not in use. >>> >>> The issue is that virtio-console continues to write to the MMIO even after >>> underlying virtio-pci device is reset. >>> >>> Additionally, Eric noticed that IOMMUs are reset before devices, if >>> devices are not reset on shutdown they continue to poke at guest memory >>> and get errors from the IOMMU. Some devices get wedged then. >>> >>> The problem can be solved by breaking all virtio devices on virtio >>> bus shutdown, then resetting them. >>> >>> Reported-by: Eric Auger <eauger@xxxxxxxxxx> >>> Reported-by: Hongyu Ning <hongyu.ning@xxxxxxxxxxxxxxx> >>> Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> >> Tested-by: Eric Auger <eric.auger@xxxxxxxxxx> >> >> Thanks! >> >> Eric >>> --- >>> drivers/virtio/virtio.c | 31 +++++++++++++++++++++++++++++++ >>> 1 file changed, 31 insertions(+) >>> >>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c >>> index c1cc1157b380..e5b29520d3b2 100644 >>> --- a/drivers/virtio/virtio.c >>> +++ b/drivers/virtio/virtio.c >>> @@ -377,6 +377,36 @@ static void virtio_dev_remove(struct device *_d) >>> of_node_put(dev->dev.of_node); >>> } >>> >>> +static void virtio_dev_shutdown(struct device *_d) >>> +{ >>> + struct virtio_device *dev = dev_to_virtio(_d); >>> + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver); >>> + >>> + /* >>> + * Stop accesses to or from the device. >>> + * We only need to do it if there's a driver - no accesses otherwise. >>> + */ >>> + if (!drv) >>> + return; >>> + >>> + /* >>> + * Some devices get wedged if you kick them after they are >>> + * reset. Mark all vqs as broken to make sure we don't. >>> + */ >>> + virtio_break_device(dev); >>> + /* >>> + * The below virtio_synchronize_cbs() guarantees that any interrupt >>> + * for this line arriving after virtio_synchronize_vqs() has completed >>> + * is guaranteed to see vq->broken as true. >>> + */ >>> + virtio_synchronize_cbs(dev); >>> + /* >>> + * As IOMMUs are reset on shutdown, this will block device access to memory. >>> + * Some devices get wedged if this happens, so reset to make sure it does not. >>> + */ > > Eric, > Could you pls drop the below line (reset), and retest? > I want to make sure the above comment is right. > Thanks! If I removed the line below, I hit the issue again: qemu-system-aarch64: virtio: zero sized buffers are not allowed So to me this is needed and the comment is right. Eric > >>> + dev->config->reset(dev); > > >>> +} >>> + >>> static const struct bus_type virtio_bus = { >>> .name = "virtio", >>> .match = virtio_dev_match, >>> @@ -384,6 +414,7 @@ static const struct bus_type virtio_bus = { >>> .uevent = virtio_uevent, >>> .probe = virtio_dev_probe, >>> .remove = virtio_dev_remove, >>> + .shutdown = virtio_dev_shutdown, >>> }; >>> >>> int __register_virtio_driver(struct virtio_driver *driver, struct module *owner) >