On 17/07/14 17:54, Ján Tomko wrote: > On 07/17/2014 08:51 AM, Jiri Denemark wrote: >> On Thu, Jul 17, 2014 at 13:29:52 +1000, Sam Bobroff wrote: >>> On 17/07/14 12:50, Eric Blake wrote: >>>> On 07/16/2014 07:52 PM, Sam Bobroff wrote: >>>>> Hello everyone, >>>> >>>> [Can you configure your mailer to wrap long lines?] >>> >>> [No problem, done.] >>> >>>>> After performing a migration, the syslog often contains several >>>>> messages like this: >>>>> >>>>> "This thread seems to be the async job owner; entering monitor >>>>> without asking for a nested job is dangerous" >>>> >>>> The sign of a bug that we need to fix. What version of libvirt are >>>> you using? We may have already fixed it. >>> >>> I've been testing on 1.2.6, but I will re-test on the latest code from git. >> >> Hmm, it what were you doing when the message appeared in the log? I >> fixed wrong usage of our job tracking APIs long time ago >> (e4e28220b572cf4c71d07740c24150411f0704a6 in 1.0.3) so I guess there >> must be another place we don't do it correctly. >> > > That commit was essentially reverted by my commit 9846402 when fixing > migration crashes: https://bugzilla.redhat.com/show_bug.cgi?id=1018267 > > I think the message was harmless during migration as we don't allow other jobs > (except destroy). > > Jan I've re-tested this on the latest libvirt from git (1.2.7) and I still see several occurrences of the "dangerous" message on the destination machine after performing a live migration with --copy-storage-all. It seems like qemuDomainObjEnterMonitorInternal() is expecting the asyncJob parameter to have the value of the currently running async job, and in the cases that cause the warning message, it's being called via qemuDomainObjEnterMonitor() which hard codes it to QEMU_ASYNC_NONE. It looks like these calls should be changed to qemuDomainObjEnterMonitorAsync() with the right asyncJob parameter, which I believe is QEMU_ASYNC_JOB_MIGRATION_IN for the cases I'm looking at (and the old value of QEMU_ASYNC_NONE for all other callers). There aren't many cases of this, so fixing it this way is fairly simple but it does involve passing the asyncJob through some intermediate functions. Another approach would be to change qemuDomainObjEnterMonitorInternal() so that it can automatically do the same thing: It could see that there was an async job running, and that it's owned by the current thread, and then call qemuDomainObjBeginNestedJob() (or directly call qemuDomainObjBeginJobInternal() since we know that the checks in qemuDomainObjBeginNestedJob() are going to succeed). It seems like this might also simplify some other code that is doing pretty much just what I described above. The second approach does fix all the cases I'm looking at, and would also fix other similar cases I haven't yet found, but I'm not sure that it would be entirely safe or if it's circumventing some sort of safety checking. Any comments? Which approach should I pursue? Cheers, Sam. -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list