Re: [PATCH] virtio: Remove virtio devices on device_shutdown()

Jason Wang <jasowang@xxxxxxxxxx> · Mon, 17 Feb 2025 11:29:07 +0800

On Fri, Feb 14, 2025 at 8:16 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
>
> On Fri, Feb 14, 2025 at 08:56:56AM +0100, Eric Auger wrote:
> > Hi,
> >
> > On 2/14/25 8:21 AM, Ning, Hongyu wrote:
> > >
> > >
> > > On 2025/2/6 16:59, Eric Auger wrote:
> > >> Hi,
> > >>
> > >> On 2/4/25 12:46 PM, Eric Auger wrote:
> > >>> Hi,
> > >>>
> > >>> On 2/3/25 3:48 PM, Michael S. Tsirkin wrote:
> > >>>> On Fri, Jan 31, 2025 at 10:53:15AM +0100, Eric Auger wrote:
> > >>>>> Hi Kirill, Michael
> > >>>>>
> > >>>>> On 8/8/24 9:51 AM, Kirill A. Shutemov wrote:
> > >>>>>> Hongyu reported a hang on kexec in a VM. QEMU reported invalid memory
> > >>>>>> accesses during the hang.
> > >>>>>>
> > >>>>>>     Invalid read at addr 0x102877002, size 2, region '(null)',
> > >>>>>> reason: rejected
> > >>>>>>     Invalid write at addr 0x102877A44, size 2, region '(null)',
> > >>>>>> reason: rejected
> > >>>>>>     ...
> > >>>>>>
> > >>>>>> It was traced down to virtio-console. Kexec works fine if virtio-
> > >>>>>> console
> > >>>>>> is not in use.
> > >>>>>>
> > >>>>>> Looks like virtio-console continues to write to the MMIO even after
> > >>>>>> underlying virtio-pci device is removed.
> > >>>>>>
> > >>>>>> The problem can be mitigated by removing all virtio devices on virtio
> > >>>>>> bus shutdown.
> > >>>>>>
> > >>>>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > >>>>>> Reported-by: Hongyu Ning <hongyu.ning@xxxxxxxxxxxxxxx>
> > >>>>>
> > >>>>> Gentle ping on that patch that seems to have fallen though the cracks.
> > >>>>>
> > >>>>> I think this fix is really needed. I have another test case with a
> > >>>>> rebooting guest exposed with virtio-net (backed by vhost-net) and
> > >>>>> viommu. Since there is currently no shutdown for the virtio-net, on
> > >>>>> reboot, the IOMMU is disabled through the native_machine_shutdown()/
> > >>>>> x86_platform.iommu_shutdown() while the virtio-net is still alive.
> > >>>>>
> > >>>>> Normally device_shutdown() should call virtio-net shutdown before the
> > >>>>> IOMMU tear down and we wouldn't see any spurious transactions after
> > >>>>> iommu shutdown.
> > >>>>>
> > >>>>> With that fix, the above test case is fixed and I do not see spurious
> > >>>>> vhost IOTLB miss spurious requests.
> > >>>>>
> > >>>>> For more details, see qemu thread ([PATCH] hw/virtio/vhost: Disable
> > >>>>> IOTLB callbacks when IOMMU gets disabled,
> > >>>>> https://lore.kernel.org/all/20250120173339.865681-1-
> > >>>>> eric.auger@xxxxxxxxxx/)
> > >>>>>
> > >>>>>
> > >>>>> Reviewed-by: Eric Auger <eric.auger@xxxxxxxxxx>
> > >>>>> Tested-by: Eric Auger <eric.auger@xxxxxxxxxx>
> > >>>>>
> > >>>>> Thanks
> > >>>>>
> > >>>>> Eric
> > >>>>>
> > >>>>>> ---
> > >>>>>>   drivers/virtio/virtio.c | 10 ++++++++++
> > >>>>>>   1 file changed, 10 insertions(+)
> > >>>>>>
> > >>>>>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > >>>>>> index a9b93e99c23a..6c2f908eb22c 100644
> > >>>>>> --- a/drivers/virtio/virtio.c
> > >>>>>> +++ b/drivers/virtio/virtio.c
> > >>>>>> @@ -356,6 +356,15 @@ static void virtio_dev_remove(struct device *_d)
> > >>>>>>       of_node_put(dev->dev.of_node);
> > >>>>>>   }
> > >>>>>>   +static void virtio_dev_shutdown(struct device *_d)
> > >>>>>> +{
> > >>>>>> +    struct virtio_device *dev = dev_to_virtio(_d);
> > >>>>>> +    struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > >>>>>> +
> > >>>>>> +    if (drv && drv->remove)
> > >>>>>> +        drv->remove(dev);
> > >>>>
> > >>>>
> > >>>> I am concerned that full remove is a heavyweight operation.
> > >>>> Do not want to slow down reboots even more.
> > >>>> How about just doing a reset, instead?
> > >>>
> > >>> I tested with
> > >>>
> > >>> static void virtio_dev_shutdown(struct device *_d)
> > >>> {
> > >>>          struct virtio_device *dev = dev_to_virtio(_d);
> > >>>
> > >>>          virtio_reset_device(dev);
> > >>> }
> > >>>
> > >>>
> > >>> and it fixes my issue.
> > >>>
> > >>> Kirill, would that fix you issue too?
> > >
> > > Hi,
> > >
> > > sorry for my late response, I synced with Kirill offline and did a retest.
> > >
> > > The issue is still reproduced on my side, kexec will be stuck in case of
> > > "console=hvc0" append in kernel cmdline and even with such patch applied.
> >
> > Thanks for testing!
> >
> > Michael, it looks like the initial patch from Kyrill is the one that
> > fixes both issues. virtio_reset_device() usage does not work for the
> > initial bug report while it works for me. Other ideas?
> >
> > Thanks
> >
> > Eric
>
> Ah, wait a second.
>
> Looks like virtio-console continues to write to the MMIO even after
> underlying virtio-pci device is removed.

Where is such code? I think the virtcons_remove() is called so the
console is unregistered from the subsystem.

Or for surprise removal, we have break the device in:

static void virtio_pci_remove(struct pci_dev *pci_dev)
{
        struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
        struct device *dev = get_device(&vp_dev->vdev.dev);

        /*
         * Device is marked broken on surprise removal so that virtio upper
         * layers can abort any ongoing operation.
         */
        if (!pci_device_is_present(pci_dev))
                virtio_break_device(&vp_dev->vdev);

>
> Hmm. I am not sure why that is a problem, but I assume some hypervisors just
> hang the system if you try to kick them after reset.
> Unfortunate that spec did not disallow it.
>
> If we want to prevent that, we want to do something like this:
>
>
>         /*
>          * Some devices get wedged if you kick them after they are
>          * reset. Mark all vqs as broken to make sure we don't.
>          */
>         virtio_break_device(dev);
>         /*
>          * The below virtio_synchronize_cbs() guarantees that any
>          * interrupt for this line arriving after
>          * virtio_synchronize_vqs() has completed is guaranteed to see
>          * vq->broken as true.
>          */
>         virtio_synchronize_cbs(dev);

This seems to relate to doing this when we reintroduce the
notification hardening.

>         dev->config->reset(dev);
>
>
> I assume this still works for you, yes?
>

Thanks

>
>
> > >
> > > my kernel code base is 6.14.0-rc2.
> > >
> > > let me know if any more experiments needed.
> > >
> > > ---
> > >  drivers/virtio/virtio.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index ba37665188b5..f9f885d04763 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -395,6 +395,13 @@ static const struct cpumask
> > > *virtio_irq_get_affinity(struct device *_d,
> > >         return dev->config->get_vq_affinity(dev, irq_vec);
> > >  }
> > >
> > > +static void virtio_dev_shutdown(struct device *_d)
> > > +{
> > > +        struct virtio_device *dev = dev_to_virtio(_d);
> > > +
> > > +        virtio_reset_device(dev);
> > > +}
> > > +
> > >  static const struct bus_type virtio_bus = {
> > >         .name  = "virtio",
> > >         .match = virtio_dev_match,
> > > @@ -403,6 +410,7 @@ static const struct bus_type virtio_bus = {
> > >         .probe = virtio_dev_probe,
> > >         .remove = virtio_dev_remove,
> > >         .irq_get_affinity = virtio_irq_get_affinity,
> > > +       .shutdown = virtio_dev_shutdown,
> > >  };
> > >
> > >  int __register_virtio_driver(struct virtio_driver *driver, struct
> > > module *owner)
> > > --
> > > 2.43.0
> > >
> > >
> > >> gentle ping.
> > >>
> > >> this also fixes another issue with qemu vSMMU + virtio-scsi-pci. With
> > >> the above addition I get rid of spurious warning in qemu on guest reboot.
> > >>
> > >> qemu-system-aarch64: virtio: zero sized buffers are not allowed
> > >> qemu-system-aarch64: vhost vring error in virtqueue 0: Invalid
> > >> argument (22)
> > >>
> > >> Would you mind if I respin?
> > >>
> > >> Thanks
> > >>
> > >> Eric
> > >>
> > >>
> > >>
> > >>
> > >>>
> > >>> Thanks
> > >>>
> > >>> Eric
> > >>>>
> > >>>>>> +}
> > >>>>>> +
> > >>>>>>   static const struct bus_type virtio_bus = {
> > >>>>>>       .name  = "virtio",
> > >>>>>>       .match = virtio_dev_match,
> > >>>>>> @@ -363,6 +372,7 @@ static const struct bus_type virtio_bus = {
> > >>>>>>       .uevent = virtio_uevent,
> > >>>>>>       .probe = virtio_dev_probe,
> > >>>>>>       .remove = virtio_dev_remove,
> > >>>>>> +    .shutdown = virtio_dev_shutdown,
> > >>>>>>   };
> > >>>>>>     int __register_virtio_driver(struct virtio_driver *driver,
> > >>>>>> struct module *owner)
> > >>>>
> > >>>
> > >>
> > >
>