Re: [PATCH] conf: Add qemu_job_wait_time option

Nobuhiro Miki <nmiki@xxxxxxxxxxxxx> · Thu, 7 May 2020 08:42:31 +0000

On 2020/05/04 17:13, Michal Privoznik wrote:
>On 5/4/20 10:07 AM, Peter Krempa wrote:
>> On Fri, May 01, 2020 at 16:09:04 +0900, MIKI Nobuhiro wrote:
>>> The waiting time to acquire the lock times out, which leads to a segment fault.
>>
>> Could you please elaborate here? Adding this band-aid is pointless if it
>> can timeout later. We do want to fix any locking issue but without
>> information we can't really.
>>
>>> In essence we should make improvements around locks, but as a workaround we
>>> will change the timeout to allow the user to increase it.
>>> This value was defined as 30 seconds, so use it as the default value.
>>> The logs are as follows:
>>>
>>> ```
>>> Timed out during operation: cannot acquire state change lock \
>>>     (held by monitor=remoteDispatchDomainCreateWithFlags)
>>> libvirtd.service: main process exited, code=killed,status=11/SEGV
>>> ```
>>
>> Unfortunately I don't consider this a proper justification for the
>> change below. Either re-state why you want this, e.g. saying that
>> shortening time may give users greater feedback, but mentioning that it
>> works around a crash is not acceptable as a justification for something
>> which doesn't fix the crash.
>
>Agreed. Allowing users to configure the timeout makes sense - we already
>do that for other timeouts, but if it is masking a real bug we need to
>fix it first. Do you have any steps to reproduce the bug? Are you able
>to get the stack trace from the coredump?

Here is a stacktrace from the coredump.
But, today I tested again on master branch (commit eea5d63a221a8f36a3ed5b1189fe619d4fa1fde2), and every virtual machines was booted successfully.
So it seems that this bug is already fixed.
I apologize for any time you may spend for me.

(gdb) p mon
$1 = (qemuMonitor *) 0x7fe0dc0142e0
(gdb) p mon->msg
$2 = (qemuMonitorMessagePtr) 0x0  # I supposed that mon is shared between worker threads and some thread may set mon->msg = NULL.

(gdb) bt
#0  qemuMonitorSend (mon=mon@entry=0x7fe0dc0142e0, msg=msg@entry=0x7fe0e3f32350) at qemu/qemu_monitor.c:981
#1  0x00007fe0d23c4428 in qemuMonitorJSONCommandWithFd (mon=0x7fe0dc0142e0, cmd=cmd@entry=0x7fe0dc014660, scm_fd=scm_fd@entry=-1, reply=reply@entry=0x7fe0e3f323e0) at qemu/qemu_monitor_json.c:333
#2  0x00007fe0d23c61cf in qemuMonitorJSONCommand (reply=0x7fe0e3f323e0, cmd=0x7fe0dc014660, mon=<optimized out>) at qemu/qemu_monitor_json.c:358
#3  qemuMonitorJSONSetCapabilities (mon=<optimized out>) at qemu/qemu_monitor_json.c:1611
#4  0x00007fe0d23b6453 in qemuMonitorSetCapabilities (mon=<optimized out>) at qemu/qemu_monitor.c:1582
#5  0x00007fe0d2394e43 in qemuProcessInitMonitor (asyncJob=QEMU_ASYNC_JOB_START, vm=0x7fe0cc028670, driver=0x7fe0801290c0) at qemu/qemu_process.c:1928
#6  qemuConnectMonitor (driver=driver@entry=0x7fe0801290c0, vm=vm@entry=0x7fe0cc028670, asyncJob=asyncJob@entry=6, retry=retry@entry=false, logCtxt=logCtxt@entry=0x7fe0dc044b40) at qemu/qemu_process.c:2003
#7  0x00007fe0d239b69c in qemuProcessWaitForMonitor (logCtxt=0x7fe0dc044b40, asyncJob=6, vm=0x7fe0cc028670, driver=0x7fe0801290c0) at qemu/qemu_process.c:2413
#8  qemuProcessLaunch (conn=conn@entry=0x7fe0c4000a00, driver=driver@entry=0x7fe0801290c0, vm=vm@entry=0x7fe0cc028670, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0,
    vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=flags@entry=17) at qemu/qemu_process.c:6993
#9  0x00007fe0d239f8f2 in qemuProcessStart (conn=conn@entry=0x7fe0c4000a00, driver=driver@entry=0x7fe0801290c0, vm=vm@entry=0x7fe0cc028670, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0,
    migrateFd=migrateFd@entry=-1, migratePath=migratePath@entry=0x0, snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17, flags@entry=1) at qemu/qemu_process.c:7230
#10 0x00007fe0d2402d59 in qemuDomainObjStart (conn=0x7fe0c4000a00, driver=driver@entry=0x7fe0801290c0, vm=0x7fe0cc028670, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7650
#11 0x00007fe0d2403436 in qemuDomainCreateWithFlags (dom=0x7fe0dc0050d0, flags=0) at qemu/qemu_driver.c:7703
#12 0x00007fe0f394f88d in virDomainCreateWithFlags (domain=domain@entry=0x7fe0dc0050d0, flags=0) at libvirt-domain.c:6600
#13 0x000055d9e00348a2 in remoteDispatchDomainCreateWithFlags (server=0x55d9e1c95140, msg=0x55d9e1cb7d10, ret=0x7fe0dc004b80, args=0x7fe0dc005110, rerr=0x7fe0e3f32c10, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:4819
#14 remoteDispatchDomainCreateWithFlagsHelper (server=0x55d9e1c95140, client=<optimized out>, msg=0x55d9e1cb7d10, rerr=0x7fe0e3f32c10, args=0x7fe0dc005110, ret=0x7fe0dc004b80) at remote/remote_daemon_dispatch_stubs.h:4797
#15 0x00007fe0f387c0d9 in virNetServerProgramDispatchCall (msg=0x55d9e1cb7d10, client=0x55d9e1cb6ce0, server=0x55d9e1c95140, prog=0x55d9e1cb3a40) at rpc/virnetserverprogram.c:435
#16 virNetServerProgramDispatch (prog=0x55d9e1cb3a40, server=server@entry=0x55d9e1c95140, client=0x55d9e1cb6ce0, msg=0x55d9e1cb7d10) at rpc/virnetserverprogram.c:302
#17 0x00007fe0f388137d in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55d9e1c95140) at rpc/virnetserver.c:137
#18 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55d9e1c95140) at rpc/virnetserver.c:158
#19 0x00007fe0f37a9c31 in virThreadPoolWorker (opaque=opaque@entry=0x55d9e1c94e50) at util/virthreadpool.c:163
#20 0x00007fe0f37a9038 in virThreadHelper (data=<optimized out>) at util/virthread.c:196
#21 0x00007fe0f0d8ce65 in start_thread () from /lib64/libpthread.so.0
#22 0x00007fe0f0ab588d in clone () from /lib64/libc.so.6

>> Changes to news.xml always must be in a separate commit.
>
>Just a short explanation - this is to ease possible backports. For
>instance, if there is a bug fix in version X, but a distro wants to
>backport it to version X-1 then the news.xml looks completely different
>there and the cherry-pick won't apply cleanly.

Thank you for your reviews.
I think this modification might be useful for other situations.
So, I'll reconstruct this patch and submit again.