On 01/18/2018 04:49 PM, Michal Privoznik wrote: > On 01/18/2018 08:25 AM, Ján Tomko wrote: >> On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote: >>> Hello libvirt-users list, >>> >>> We're catching the same bug since 3.4.0 version (3.3.0 works OK). >>> So, we have process that is permanently connected to libvirtd via socket >>> and it is collecting stats, listening to events and control the VPSes. >>> >>> When we try to 'shutdown' a number of VPSes we often catch the bug. >>> One of >>> VPSes sticks in 'in shutdown' state, no related 'qemu' process is >>> present, >>> and there is the next error in the log: >>> >>> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: >>> 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status >>> data >>> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >>> 20441: error : virFileReadAll:1420 : Failed to open file >>> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': >>> >>> No such file or directory >>> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >>> 20441: error : virCgroupGetValueStr:844 : Unable to read from >>> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': >>> >>> No such file or directory >>> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: >>> 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu >>> account: Operation not permitted >>> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: >>> 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job >>> (destroy, none) for domain DOMAIN1; current job is (query, none) owned by >>> (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) >>> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: >>> 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during >>> operation: cannot acquire state change lock (held by >>> remoteDispatchConnectGetAllDomainStats) >>> >>> I think only the last line matters. >>> The bug is highly reproducible. We can easily catch it even when we call >>> multiple 'virsh shutdown' in shell one by one. >>> >>> When we shutdown the process connected to the socket - everything >>> become OK >>> and the bug is gone. >>> >>> The system is used is Gentoo Linux, tried all modern versions of libvirt >>> (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from >>> git)) >>> and they have this bug. 3.3.0 works OK. >>> >> I don't see anything obvious stats related in the diff between 3.3.0 and >> 3.4.0. We have added reporting of the shutdown reason, but that's just >> parsing one more JSON reply we previously ignored. >> >> Can you try running 'git bisect' to pinpoint the exact commit that >> caused this issue? > I am able to reproduce this issue, ran bisect and fount that the commit > which broke it is aeda1b8c56dc58b0a413acc61bbea938b40499e1. > > https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=aeda1b8c56dc58b0a413acc61bbea938b40499e1;hp=ec337aee9b20091d6f9f60b78f210d55f812500b > > But it's very unlikely that the commit is causing the error. If anything > it is just exposing whatever error we have there. I mean, if I revert > the commit on the top of current HEAD I can no longer reproduce the issue. Hi, looks like we hit the same issue in oVirt, see for example https://bugzilla.redhat.com/show_bug.cgi?id=1532277 there is a fix planned, and/or a BZ entry I can track? Thanks, -- Francesco Romani Senior SW Eng., Virtualization R&D Red Hat IRC: fromani github: @fromanirh _______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users