On 30.09.2016 00:11, John Ferlan wrote: > > > On 09/27/2016 05:06 AM, Nikolay Shirokovskiy wrote: >> >> >> On 26.09.2016 23:07, John Ferlan wrote: >>> >>> >>> On 09/07/2016 05:24 AM, Nikolay Shirokovskiy wrote: >>>> Special error code helps gracefully handle race conditions on >>>> blockjob cancelling. Consider for example pull backup. We want >>>> to stop it and in the process we cancel blockjob for some of the >>>> exported disks. But at the time this command reaches qemu blockjob >>>> fail for some reason and cancel result and result of stopping >>>> will be an error with quite unexpecte message - no such job (sort of). >>>> Instead we can detect this race and treat as correct cancelling >>>> and wait until fail event will arrive. Then we can report corect >>>> error message with some details from qemu probably. >>>> >>>> Users of domainBlockJobAbort can be affected as -2 is new possible >>>> return code. Not sure if I should stick to the original behaviour in >>>> this case. >>>> --- >>>> src/qemu/qemu_monitor.c | 5 +++++ >>>> src/qemu/qemu_monitor_json.c | 11 +++++++---- >>>> 2 files changed, 12 insertions(+), 4 deletions(-) >>>> >>> >>> I haven't looked forward to patch 3/3 yet, but perhaps it'd be better in >>> the "caller" that cares to : >>> >>> if (qemu*() < 0) { >>> virErrorPtr err = virGetLastError(); >>> if (err && err->code == VIR_ERR_OPERATION_INVALID) { >>> /* Do something specific */ >>> } >>> } >>> >>> Returning -2 alters virDomainBlockJobAbort expected return values and >>> well that's probably not a good idea since we don't know if some caller >>> has said (if virDomainBlockJobAbort() == -1) instead of <= -1 >>> >> >> It is common place in libvirt to have special error codes instead of 0/-1 >> only (qemuMonitorJSONCheckCPUx86) and it feels bulky to have checks that >> involve virGetLastError. It looks like other places that use similar >> construction do not have other choice because of rpc call or smth. > > True - my concern was more that virDomainBlockJobAbort now has a new > possible error value of -2 that's not described. Still just adding to > the documentation that it can now return -1 or -2 probably isn't the > best idea especially if what you're trying to do is resolve/handle an > issue for a specific case within qemu code. > > Other API's within the bowels of qemu (monitor, monitor_json, etc) > returning -2 usually is somehow handled by the caller, but before > returning out into what would be API land would return what the API has > documented ({0, -1}, {true, false}, {NULL, address}, {count, 0, -1}, etc.) > > Changing an external API return value is a risky proposition - works for > some instances fails for others... I've seen code that does this: > > if (public_api(arg1, arg2) == -1) { > print some error message > return failure > } > > even though it's been strongly suggested to do something like > > if (public_api(arg1, arg2) < 0) { > get specific error (usually via errno) and perhaps > make a decision based on that... but fall through to > print some error message > return failure > } > >> >> I think when function spawns error when it comes to error code >> value is does not take much care as they are not really semantic >> (at least such common error as invalid operation) >> so one day the caller can catch 'operation invalid' for different >> reason from different place on the stack. >> >> As to virDomainBlockJobAbort it is easy to patch) >> >> I think I should also remove reporting error from qemuMonitorJSONBlockJobError >> in case of DeviceNotActive however it is used from several places even >> like setting speed. It is legacy of commit b976165c when >> a lot of qemu commands share common error checking code. I think I'd >> better copy this code to qemuMonitorJSONBlockJobCancel and change it there >> instead of common place. What do you think? >> > > I think you need to think about all the callers of > qemuMonitorJSONBlockJobError... I almost hate to use the word fragile > to describe things, but there is a lot of complexity involved with > respect to all the consumers. I just recall hearing about someone doing > a lot of muttering while implementing the code. The snapshot code in > particular (which you seem to have copied to a degree) has been the > source of some fairly "exotic" bug reports. While snapshot and migration code was source of inspiration for me I tried to keep understanding of how it applied to backups )) Particularly in the main patch I rewrote functions that cancel multiple blockjob operation a bit. > > I personally don't have a lot of insight into the totality of the > blockdev* code. I'd need to spend some time to really understand things > better and provide a more clear/concise answer... > Ok, so after all I'll change this patch to keep virDomainBlockJobAbort API and will not touch qemuMonitorJSONBlockJobError (just spawn a copy for qemuMonitorJSONBlockJobCancel). Nikolay -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list