Re: [PATCH 2/3] qemu: special error code in case of no job on cancel block job

Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> · Fri, 30 Sep 2016 16:47:49 +0300

On 30.09.2016 00:11, John Ferlan wrote:
> 
> 
> On 09/27/2016 05:06 AM, Nikolay Shirokovskiy wrote:
>>
>>
>> On 26.09.2016 23:07, John Ferlan wrote:
>>>
>>>
>>> On 09/07/2016 05:24 AM, Nikolay Shirokovskiy wrote:
>>>> Special error code helps gracefully handle race conditions on
>>>> blockjob cancelling. Consider for example pull backup. We want
>>>> to stop it and in the process we cancel blockjob for some of the
>>>> exported disks. But at the time this command reaches qemu blockjob
>>>> fail for some reason and cancel result and result of stopping
>>>> will be an error with quite unexpecte message - no such job (sort of).
>>>> Instead we can detect this race and treat as correct cancelling
>>>> and wait until fail event will arrive. Then we can report corect
>>>> error message with some details from qemu probably.
>>>>
>>>> Users of domainBlockJobAbort can be affected as -2 is new possible
>>>> return code. Not sure if I should stick to the original behaviour in
>>>> this case.
>>>> ---
>>>>  src/qemu/qemu_monitor.c      |  5 +++++
>>>>  src/qemu/qemu_monitor_json.c | 11 +++++++----
>>>>  2 files changed, 12 insertions(+), 4 deletions(-)
>>>>
>>>
>>> I haven't looked forward to patch 3/3 yet, but perhaps it'd be better in
>>> the "caller" that cares to :
>>>
>>>     if (qemu*() < 0) {
>>>         virErrorPtr err = virGetLastError();
>>>         if (err && err->code == VIR_ERR_OPERATION_INVALID) {
>>>             /* Do something specific */
>>>         }
>>>     }
>>>
>>> Returning -2 alters virDomainBlockJobAbort expected return values and
>>> well that's probably not a good idea since we don't know if some caller
>>> has said (if virDomainBlockJobAbort() == -1) instead of <= -1
>>>
>>
>> It is common place in libvirt to have special error codes instead of 0/-1
>> only (qemuMonitorJSONCheckCPUx86) and it feels bulky to have checks that
>> involve virGetLastError. It looks like other places that use similar 
>> construction do not have other choice because of rpc call or smth.
> 
> True - my concern was more that virDomainBlockJobAbort now has a new
> possible error value of -2 that's not described.  Still just adding to
> the documentation that it can now return -1 or -2 probably isn't the
> best idea especially if what you're trying to do is resolve/handle an
> issue for a specific case within qemu code.
> 
> Other API's within the bowels of qemu (monitor, monitor_json, etc)
> returning -2 usually is somehow handled by the caller, but before
> returning out into what would be API land would return what the API has
> documented ({0, -1}, {true, false}, {NULL, address}, {count, 0, -1}, etc.)
> 
> Changing an external API return value is a risky proposition - works for
> some instances fails for others... I've seen code that does this:
> 
>   if (public_api(arg1, arg2) == -1) {
>       print some error message
>       return failure
>   }
> 
> even though it's been strongly suggested to do something like
> 
>    if (public_api(arg1, arg2) < 0) {
>        get specific error (usually via errno) and perhaps
>           make a decision based on that... but fall through to
>        print some error message
>        return failure
>    }
> 
>>
>> I think when function spawns error when it comes to error code
>> value is does not take much care as they are not really semantic
>> (at least such common error as invalid operation) 
>> so one day the caller can catch 'operation invalid' for different
>> reason from different place on the stack.
>>
>> As to virDomainBlockJobAbort it is easy to patch)
>>
>> I think I should also remove reporting error from qemuMonitorJSONBlockJobError 
>> in case of DeviceNotActive however it is used from several places even
>> like setting speed. It is legacy of commit b976165c when
>> a lot of qemu commands share common error checking code. I think I'd
>> better copy this code to qemuMonitorJSONBlockJobCancel and change it there
>> instead of common place. What do you think?
>>
> 
> I think you need to think about all the callers of
> qemuMonitorJSONBlockJobError...  I almost hate to use the word fragile
> to describe things, but there is a lot of complexity involved with
> respect to all the consumers. I just recall hearing about someone doing
> a lot of muttering while implementing the code. The snapshot code in
> particular (which you seem to have copied to a degree) has been the
> source of some fairly "exotic" bug reports.

While snapshot and migration code was source of inspiration for
me I tried to keep understanding of how it applied to backups )) 
Particularly in the main patch I rewrote functions that cancel 
multiple blockjob operation a bit.

> 
> I personally don't have a lot of insight into the totality of the
> blockdev* code. I'd need to spend some time to really understand things
> better and provide a more clear/concise answer...
> 

Ok, so after all I'll change this patch to keep virDomainBlockJobAbort API
and will not touch qemuMonitorJSONBlockJobError (just spawn a copy for 
qemuMonitorJSONBlockJobCancel).

Nikolay

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list