On 04.05.2018 17:52, John Ferlan wrote: > > > On 05/03/2018 03:54 AM, Nikolay Shirokovskiy wrote: >> >> >> On 01.05.2018 01:03, John Ferlan wrote: >>> >>> >>> On 04/18/2018 10:44 AM, Nikolay Shirokovskiy wrote: >>>> Block job abort operation can not handle properly qemu crashes >>>> when waiting for abort/pivot completion. Deadlock scenario is next: >>>> >>>> - qemuDomainBlockJobAbort waits for pivot/abort completion >>>> - qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and >>>> then waits for job condition (taken by qemuDomainBlockJobAbort) >>>> - qemuDomainBlockJobAbort awakes but nothing really changed, VM is still >>>> active (vm->def->id != -1) so thread starts waiting for completion again. >>>> Now two threads are in deadlock. >>>> >>>> First let's add some condition besides domain active status so that waiting >>>> thread can check it when awakes. Second let's move signalling to the place >>>> where condition is set, namely monitor eof/error handlers. Having signalling >>>> in qemuProcessBeginStopJob is not useful. >>>> >>>> The patch copies monitor error to domain state because at time >>>> waiting thread awakes there can be no monitor and it is useful to >>>> report monitor error to user. >>>> >>>> The patch has a drawback that on destroying a domain when waiting for >>>> event from monitor we get not very convinient error message like >>> >>> convenient >>> >>>> 'EOF from monitor' from waiting API. On the other hand if qemu crashes >>>> this is more useful then 'domain is not running'. The first case >>>> will be addressed in another patch. >>>> >>>> The patch also fixes other places where we wait for event from qemu. >>>> Namely handling device removal and tray ejection. Other places which >>>> used virDomainObjWait (dump, migration, save) were good because of >>>> async jobs which allow concurrent destroy job. >>>> >>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >>>> --- >>>> src/conf/domain_conf.c | 43 ------------------------------------------- >>>> src/conf/domain_conf.h | 3 --- >>>> src/libvirt_private.syms | 2 -- >>>> src/qemu/qemu_domain.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ >>>> src/qemu/qemu_domain.h | 5 ++++- >>>> src/qemu/qemu_driver.c | 6 +++--- >>>> src/qemu/qemu_hotplug.c | 4 ++-- >>>> src/qemu/qemu_migration.c | 12 ++++++------ >>>> src/qemu/qemu_process.c | 27 ++++++++++++++++++++++----- >>>> 9 files changed, 82 insertions(+), 65 deletions(-) >>>> >>> >>> This no longer git am -3 applies and based on previous patch reviews, I >>> think perhaps this needs to be redone. >> >> I'll resend as soon as we come to agreement on series. Now I'm not >> convinced to change much (except for using distinct flag to indicate error >> as mentioned in 2nd patch discussion, then I don't need 3d patch). >> >> See comments below. >> >>> >>> I don't believe moving/renaming virDomainObjWait is good/right, but I'm >>> sure there would be other opinions on that. Yes, QEMU is currently the >>> only consumer... If it were to move, it should move to virdomainobjlist >>> since that's where innards of virDomainObjPtr are managed. The fact that >>> we look at ->parent.lock, well, <sigh>... >> >> I would not move the function without reason. It gets new name qemuDomainObjWait >> becase it use qemu specific data, namely monError. > > Today only qemu uses this generic virDomainObjWait which is generic > without the need to have/use qemu specific things. Other domain > consumers could use it if they chose. > > I'm not in favor of moving, renaming, repurposing for a very specific > issue for what is a generically used function. Maybe that's just me - so > you could try to get a different reviewer if you want. Sorry. > > >> >>> >>> Anyway, you're attempting to special case something and perhaps you just >>> need to create a qemuDomainObjWait that would call the timeout version >>> of the virDomainObjWait and be able to handle whether some other error >>> occurred. Wouldn't the LastError before the current wait return NULL >>> (as in no error), then during the timeout period if LastError is >>> something couldn't that indicate the failure you're looking for. >> >> I introduced qemuDomainObjWait not to put virDomainObjWait and virDomainObjWaitUntil >> in the first place but to check monError. Then rather then keeping too functions >> it's look more simple to use only one and branch on given timeout. >> > > I would think if the goal was to have specific code for a specific > purpose for specific functions, then introduce the qemuDomainObjWait, > but have it call virDomainObjWait[Until] based on the need you have. > Which in this case appears to fiddle with monError in some way. > > Again - that's just my opinion > > John > Ok we can do that way. I just think that as virDomainObjWait/virDomainObjWaitUntil will not be used anywhere in code after the patch then sooner or later somebody will clean them up. Nikolay >>> >>> I didn't spend a lot of time thinking about alternatives and how the >>> code should change, but when you mention the patch has a drawback - that >>> immediately raises concern. >> >> But ... I addressed this issue in next patch as I wrote. >> > > > [...] > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list