Re: [PATCH] virsh: improve waiting for block job readiness

Peter Krempa <pkrempa@xxxxxxxxxx> · Mon, 4 Jan 2016 16:09:16 +0100

On Thu, Dec 31, 2015 at 16:34:49 +1100, Michael Chapman wrote:
> Wait for a block job event after the job has reached 100% only if
> exactly one of the BLOCK_JOB and BLOCK_JOB_2 callbacks were successfully
> registered.
> 
> If neither callback was registered, then no event will ever be received.
> If both were registered, then any user-supplied path is guaranteed to
> match one of the events.
> 
> Signed-off-by: Michael Chapman <mike@xxxxxxxxxxxxxxxxx>
> ---
> 
> I have found that even a 2.5 second timeout isn't always sufficiently
> long for QEMU to flush a disk at the end of a block job.
> 
> I hope I've understood the code properly here, because as far as I can
> tell the comment I'm removing in this commit isn't right. The path the
> user supplies *has* to be either the <source file='name'/> or <target
> dev='name'/> exactly in order for the disk to be matched, and these are
> precisely the two strings used by the two events.
> 
>  tools/virsh-domain.c | 24 +++++++++++++-----------
>  1 file changed, 13 insertions(+), 11 deletions(-)

In addition to Andrea's review:

> 
> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
> index edbbc34..60de9ba 100644
> --- a/tools/virsh-domain.c
> +++ b/tools/virsh-domain.c
> @@ -1942,18 +1942,20 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data)
>              goto cleanup;
>          }
>  
> -        /* since virsh can't guarantee that the path provided by the user will
> -         * later be matched in the event we will need to keep the fallback
> -         * approach and claim success if the block job finishes or vanishes. */
> -        if (result == 0)
> -            break;
> +        /* if either event could not be registered we can't guarantee that the
> +         * path provided by the user will be matched, so keep the fallback
> +         * approach and claim success if the block job finishes or vanishes */

The new statement is not true. If the user provides a filesystem path
different from what is in the definition but matching the same volume [1]
the callback will still return the path present in the configuration and
thus might not ever match and the job would hang forever.

[1] e.g. /path/to/image and /path/to/../to/image are equivalent but
won't be equal using strcmp. The problem is even more prominent if you
mix in some symlinks.

> +        if (data->cb_id2 < 0 || data->cb_id2 < 0) {
> +            if (result == 0)
> +                break;
>  
> -        /* for two-phase jobs we will try to wait in the synchronized phase
> -         * for event arrival since 100% completion doesn't necessarily mean that
> -         * the block job has finished and can be terminated with success */
> -        if (info.end == info.cur && --retries == 0) {
> -            ret = VIR_DOMAIN_BLOCK_JOB_READY;
> -            goto cleanup;
> +            /* only wait in the synchronized phase for event arrival if
> +             * either callback was registered */
> +            if (info.end == info.cur &&
> +                ((data->cb_id2 < 0 && data->cb_id2 < 0) || --retries == 0)) {
> +                ret = VIR_DOMAIN_BLOCK_JOB_READY;
> +                goto cleanup;
> +            }
>          }

NACK to this approach, if there was a way how this ugly stuff could be
avoided or deleted I'd already do so.

Peter
Attachment:
signature.asc

Description: Digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list