On 01/18/2018 08:25 AM, Ján Tomko wrote: > On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote: >> Hello libvirt-users list, >> >> We're catching the same bug since 3.4.0 version (3.3.0 works OK). >> So, we have process that is permanently connected to libvirtd via socket >> and it is collecting stats, listening to events and control the VPSes. >> >> When we try to 'shutdown' a number of VPSes we often catch the bug. >> One of >> VPSes sticks in 'in shutdown' state, no related 'qemu' process is >> present, >> and there is the next error in the log: >> >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: >> 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status >> data >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >> 20441: error : virFileReadAll:1420 : Failed to open file >> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': >> >> No such file or directory >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >> 20441: error : virCgroupGetValueStr:844 : Unable to read from >> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': >> >> No such file or directory >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >> 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu >> account: Operation not permitted >> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: >> 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job >> (destroy, none) for domain DOMAIN1; current job is (query, none) owned by >> (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) >> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: >> 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during >> operation: cannot acquire state change lock (held by >> remoteDispatchConnectGetAllDomainStats) >> >> I think only the last line matters. >> The bug is highly reproducible. We can easily catch it even when we call >> multiple 'virsh shutdown' in shell one by one. >> >> When we shutdown the process connected to the socket - everything >> become OK >> and the bug is gone. >> >> The system is used is Gentoo Linux, tried all modern versions of libvirt >> (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from >> git)) >> and they have this bug. 3.3.0 works OK. >> > > I don't see anything obvious stats related in the diff between 3.3.0 and > 3.4.0. We have added reporting of the shutdown reason, but that's just > parsing one more JSON reply we previously ignored. > > Can you try running 'git bisect' to pinpoint the exact commit that > caused this issue? I am able to reproduce this issue, ran bisect and fount that the commit which broke it is aeda1b8c56dc58b0a413acc61bbea938b40499e1. https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=aeda1b8c56dc58b0a413acc61bbea938b40499e1;hp=ec337aee9b20091d6f9f60b78f210d55f812500b But it's very unlikely that the commit is causing the error. If anything it is just exposing whatever error we have there. I mean, if I revert the commit on the top of current HEAD I can no longer reproduce the issue. Michal _______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users