Re: [PATCH v2] virtio_pmem: Check device status before requesting flush

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

On Wed, Aug 21, 2024 at 1:37 PM Ira Weiny <ira.weiny@xxxxxxxxx> wrote:
>
> Philip Chen wrote:
> > Hi,
> >
> > On Tue, Aug 20, 2024 at 1:01 PM Dave Jiang <dave.jiang@xxxxxxxxx> wrote:
> > >
> > >
> > >
> > > On 8/20/24 10:22 AM, Philip Chen wrote:
> > > > If a pmem device is in a bad status, the driver side could wait for
> > > > host ack forever in virtio_pmem_flush(), causing the system to hang.
> > > >
> > > > So add a status check in the beginning of virtio_pmem_flush() to return
> > > > early if the device is not activated.
> > > >
> > > > Signed-off-by: Philip Chen <philipchen@xxxxxxxxxxxx>
> > > > ---
> > > >
> > > > v2:
> > > > - Remove change id from the patch description
> > > > - Add more details to the patch description
> > > >
> > > >  drivers/nvdimm/nd_virtio.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c
> > > > index 35c8fbbba10e..97addba06539 100644
> > > > --- a/drivers/nvdimm/nd_virtio.c
> > > > +++ b/drivers/nvdimm/nd_virtio.c
> > > > @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region)
> > > >       unsigned long flags;
> > > >       int err, err1;
> > > >
> > > > +     /*
> > > > +      * Don't bother to submit the request to the device if the device is
> > > > +      * not acticated.
> > >
> > > s/acticated/activated/
> >
> > Thanks for the review.
> > I'll fix this typo in v3.
> >
> > In addition to this typo, does anyone have any other concerns?
>
> I'm not super familiar with the virtio-pmem workings and the needs reset
> flag is barely used.
>
> Did you actually experience this hang?  How was this found?  What is the
> user visible issue and how critical is it?

Yes, I experienced the problem when trying to enable hibernation for a VM.

In the typical hibernation flow, the kernel would try to:
(1) freeze the processes
(2) freeze the devices
(3) take a snapshot of the memory
(4) thaw the devices
(5) write the snapshot to the storage
(6) power off the system (or perform platform-specific action)

In my case, I see VMM fail to re-activate the virtio_pmem device in (4).
(And therefore the virtio_pmem device side sets VIRTIO_CONFIG_S_NEEDS_RESET.)
As a result, when the kernel tries to power off the virtio_pmem device
in (6), the system would hang in virtio_pmem_flush() if this patch is
not added.

To fix the root cause of this issue, I sent another patch to add
freeze/restore PM callbacks to the virtio_pmem driver:
https://lore.kernel.org/lkml/20240815004617.2325269-1-philipchen@xxxxxxxxxxxx/
(Please also help review that patch.)

However, I think this patch is still helpful since the system
shouldn't hang in virtio_pmem_flush() regardless of the device state.

>
> Thanks,
> Ira
>
> >
> > >
> > > > +      */
> > > > +     if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) {
> > > > +             dev_info(&vdev->dev, "virtio pmem device needs a reset\n");
> > > > +             return -EIO;
> > > > +     }
> > > > +
> > > >       might_sleep();
> > > >       req_data = kmalloc(sizeof(*req_data), GFP_KERNEL);
> > > >       if (!req_data)
>
>





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux