On Tue, 16 Apr 2019 17:23:14 -0400 Farhan Ali <alifm@xxxxxxxxxxxxx> wrote: > The quiesce function calls cio_cancel_halt_clear() and if we > get an -EBUSY we go into a loop where we: > - wait for any interrupts > - flush all I/O in the workqueue > - retry cio_cancel_halt_clear > > During the period where we are waiting for interrupts or > flushing all I/O, the channel subsystem could have completed > a halt/clear action and turned off the corresponding activity > control bits in the subchannel status word. This means the next > time we call cio_cancel_halt_clear(), we will again start by > calling cancel subchannel and so we can be stuck between calling > cancel and halt forever. > > Rather than calling cio_cancel_halt_clear() immediately after > waiting, let's try to disable the subchannel. If we succeed in > disabling the subchannel then we know nothing else can happen > with the device. > > Suggested-by: Eric Farman <farman@xxxxxxxxxxxxx> > Signed-off-by: Farhan Ali <alifm@xxxxxxxxxxxxx> > --- > ChangeLog: > v2 -> v3 > - Log an error message when cio_cancel_halt_clear > returns EIO and break out of the loop. > > - Did not include past change log as the other patches > of the original series have been queued by Conny. > Old series (v2) can be found here: > https://marc.info/?l=kvm&m=155475754101769&w=2 > > drivers/s390/cio/vfio_ccw_drv.c | 32 ++++++++++++++++++-------------- > 1 file changed, 18 insertions(+), 14 deletions(-) > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c > index 78517aa..66a66ac 100644 > --- a/drivers/s390/cio/vfio_ccw_drv.c > +++ b/drivers/s390/cio/vfio_ccw_drv.c > @@ -43,26 +43,30 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch) > if (ret != -EBUSY) > goto out_unlock; > > + iretry = 255; > do { > - iretry = 255; > > ret = cio_cancel_halt_clear(sch, &iretry); > - while (ret == -EBUSY) { > - /* > - * Flush all I/O and wait for > - * cancel/halt/clear completion. > - */ > - private->completion = &completion; > - spin_unlock_irq(sch->lock); > > - wait_for_completion_timeout(&completion, 3*HZ); > + if (ret == -EIO) { > + pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", > + sch->schid.ssid, sch->schid.sch_no); What about using dev_err(&sch->dev, "could not quiesce"); instead? (Can make that change while applying, no need to resend for that.) > + break; > + } > + > + /* > + * Flush all I/O and wait for > + * cancel/halt/clear completion. > + */ > + private->completion = &completion; > + spin_unlock_irq(sch->lock); > > - private->completion = NULL; > - flush_workqueue(vfio_ccw_work_q); > - spin_lock_irq(sch->lock); > - ret = cio_cancel_halt_clear(sch, &iretry); > - }; > + if (ret == -EBUSY) > + wait_for_completion_timeout(&completion, 3*HZ); > > + private->completion = NULL; > + flush_workqueue(vfio_ccw_work_q); > + spin_lock_irq(sch->lock); > ret = cio_disable_subchannel(sch); > } while (ret == -EBUSY); > out_unlock: Otherwise, looks good to me. Will queue when I get some ack/r-b.