On 05/03/2018 03:54 AM, Nikolay Shirokovskiy wrote: > > > On 01.05.2018 01:03, John Ferlan wrote: >> >> >> On 04/18/2018 10:44 AM, Nikolay Shirokovskiy wrote: >>> Block job abort operation can not handle properly qemu crashes >>> when waiting for abort/pivot completion. Deadlock scenario is next: >>> >>> - qemuDomainBlockJobAbort waits for pivot/abort completion >>> - qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and >>> then waits for job condition (taken by qemuDomainBlockJobAbort) >>> - qemuDomainBlockJobAbort awakes but nothing really changed, VM is still >>> active (vm->def->id != -1) so thread starts waiting for completion again. >>> Now two threads are in deadlock. >>> >>> First let's add some condition besides domain active status so that waiting >>> thread can check it when awakes. Second let's move signalling to the place >>> where condition is set, namely monitor eof/error handlers. Having signalling >>> in qemuProcessBeginStopJob is not useful. >>> >>> The patch copies monitor error to domain state because at time >>> waiting thread awakes there can be no monitor and it is useful to >>> report monitor error to user. >>> >>> The patch has a drawback that on destroying a domain when waiting for >>> event from monitor we get not very convinient error message like >> >> convenient >> >>> 'EOF from monitor' from waiting API. On the other hand if qemu crashes >>> this is more useful then 'domain is not running'. The first case >>> will be addressed in another patch. >>> >>> The patch also fixes other places where we wait for event from qemu. >>> Namely handling device removal and tray ejection. Other places which >>> used virDomainObjWait (dump, migration, save) were good because of >>> async jobs which allow concurrent destroy job. >>> >>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >>> --- >>> src/conf/domain_conf.c | 43 ------------------------------------------- >>> src/conf/domain_conf.h | 3 --- >>> src/libvirt_private.syms | 2 -- >>> src/qemu/qemu_domain.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ >>> src/qemu/qemu_domain.h | 5 ++++- >>> src/qemu/qemu_driver.c | 6 +++--- >>> src/qemu/qemu_hotplug.c | 4 ++-- >>> src/qemu/qemu_migration.c | 12 ++++++------ >>> src/qemu/qemu_process.c | 27 ++++++++++++++++++++++----- >>> 9 files changed, 82 insertions(+), 65 deletions(-) >>> >> >> This no longer git am -3 applies and based on previous patch reviews, I >> think perhaps this needs to be redone. > > I'll resend as soon as we come to agreement on series. Now I'm not > convinced to change much (except for using distinct flag to indicate error > as mentioned in 2nd patch discussion, then I don't need 3d patch). > > See comments below. > >> >> I don't believe moving/renaming virDomainObjWait is good/right, but I'm >> sure there would be other opinions on that. Yes, QEMU is currently the >> only consumer... If it were to move, it should move to virdomainobjlist >> since that's where innards of virDomainObjPtr are managed. The fact that >> we look at ->parent.lock, well, <sigh>... > > I would not move the function without reason. It gets new name qemuDomainObjWait > becase it use qemu specific data, namely monError. Today only qemu uses this generic virDomainObjWait which is generic without the need to have/use qemu specific things. Other domain consumers could use it if they chose. I'm not in favor of moving, renaming, repurposing for a very specific issue for what is a generically used function. Maybe that's just me - so you could try to get a different reviewer if you want. > >> >> Anyway, you're attempting to special case something and perhaps you just >> need to create a qemuDomainObjWait that would call the timeout version >> of the virDomainObjWait and be able to handle whether some other error >> occurred. Wouldn't the LastError before the current wait return NULL >> (as in no error), then during the timeout period if LastError is >> something couldn't that indicate the failure you're looking for. > > I introduced qemuDomainObjWait not to put virDomainObjWait and virDomainObjWaitUntil > in the first place but to check monError. Then rather then keeping too functions > it's look more simple to use only one and branch on given timeout. > I would think if the goal was to have specific code for a specific purpose for specific functions, then introduce the qemuDomainObjWait, but have it call virDomainObjWait[Until] based on the need you have. Which in this case appears to fiddle with monError in some way. Again - that's just my opinion John >> >> I didn't spend a lot of time thinking about alternatives and how the >> code should change, but when you mention the patch has a drawback - that >> immediately raises concern. > > But ... I addressed this issue in next patch as I wrote. > [...] -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list