Hi On Wed, Aug 21, 2024 at 1:37 PM Ira Weiny <ira.weiny@xxxxxxxxx> wrote: > > Philip Chen wrote: > > Hi, > > > > On Tue, Aug 20, 2024 at 1:01 PM Dave Jiang <dave.jiang@xxxxxxxxx> wrote: > > > > > > > > > > > > On 8/20/24 10:22 AM, Philip Chen wrote: > > > > If a pmem device is in a bad status, the driver side could wait for > > > > host ack forever in virtio_pmem_flush(), causing the system to hang. > > > > > > > > So add a status check in the beginning of virtio_pmem_flush() to return > > > > early if the device is not activated. > > > > > > > > Signed-off-by: Philip Chen <philipchen@xxxxxxxxxxxx> > > > > --- > > > > > > > > v2: > > > > - Remove change id from the patch description > > > > - Add more details to the patch description > > > > > > > > drivers/nvdimm/nd_virtio.c | 9 +++++++++ > > > > 1 file changed, 9 insertions(+) > > > > > > > > diff --git a/drivers/nvdimm/nd_virtio.c b/drivers/nvdimm/nd_virtio.c > > > > index 35c8fbbba10e..97addba06539 100644 > > > > --- a/drivers/nvdimm/nd_virtio.c > > > > +++ b/drivers/nvdimm/nd_virtio.c > > > > @@ -44,6 +44,15 @@ static int virtio_pmem_flush(struct nd_region *nd_region) > > > > unsigned long flags; > > > > int err, err1; > > > > > > > > + /* > > > > + * Don't bother to submit the request to the device if the device is > > > > + * not acticated. > > > > > > s/acticated/activated/ > > > > Thanks for the review. > > I'll fix this typo in v3. > > > > In addition to this typo, does anyone have any other concerns? > > I'm not super familiar with the virtio-pmem workings and the needs reset > flag is barely used. > > Did you actually experience this hang? How was this found? What is the > user visible issue and how critical is it? Yes, I experienced the problem when trying to enable hibernation for a VM. In the typical hibernation flow, the kernel would try to: (1) freeze the processes (2) freeze the devices (3) take a snapshot of the memory (4) thaw the devices (5) write the snapshot to the storage (6) power off the system (or perform platform-specific action) In my case, I see VMM fail to re-activate the virtio_pmem device in (4). (And therefore the virtio_pmem device side sets VIRTIO_CONFIG_S_NEEDS_RESET.) As a result, when the kernel tries to power off the virtio_pmem device in (6), the system would hang in virtio_pmem_flush() if this patch is not added. To fix the root cause of this issue, I sent another patch to add freeze/restore PM callbacks to the virtio_pmem driver: https://lore.kernel.org/lkml/20240815004617.2325269-1-philipchen@xxxxxxxxxxxx/ (Please also help review that patch.) However, I think this patch is still helpful since the system shouldn't hang in virtio_pmem_flush() regardless of the device state. > > Thanks, > Ira > > > > > > > > > > + */ > > > > + if (vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_NEEDS_RESET) { > > > > + dev_info(&vdev->dev, "virtio pmem device needs a reset\n"); > > > > + return -EIO; > > > > + } > > > > + > > > > might_sleep(); > > > > req_data = kmalloc(sizeof(*req_data), GFP_KERNEL); > > > > if (!req_data) > >