On Mon, Jun 20, 2016 at 12:28 AM, Andrey Korolyov <andrey@xxxxxxx> wrote: > Hi, > > I`ve observed this issue previously on an old 3.10 branch but wrote it > off due to inability to reproduce in any meaningful way. Currently I > am seeing it on 3.10 branch where all KVM-related and RCU-related > issues are patched more or less for well-known issues. > > Way to obtain a problematic state: > - run a hypervisor for essentially long time, it took a year and half > previously for issue to come on the mentioned old branch, but for > newer kernel and probably due to higher load it took roughly a half of > a year, > - suddenly a single VM obtains a lock and became unresponsive while > all threads displaying Running state, under this lock VM is neither > not killable via SIGKILL and not freezeable via freezer cgroup, the > only obvious symptoms is that it does not consume any cpu cycles > anymore (no counter inside sched info ) and of course it is > non-debuggable anymore. As it follows, it is quite impossible to say > at a glance where lock sits, as there is no distinctive processes > which are at least sleeping and could be moved out of sight. > > It looks like I could have met pure scheduler issue, so if nothing > from attached recursive stack/status dump would click on an idea, I`d > CC scheduler folks. Timer/RCU configs are attached for the > convenience. > > Thanks for looking into this! > > stack: > http://xdel.ru/downloads/vm-sched-hang/stack.txt > status: > http://xdel.ru/downloads/vm-sched-hang/status.txt -qemu-devel So far the issue seems to be a matter of the errata of the specific CPU series (E5 2620v1 made within first fwo quarters after official announcement, though Intel erratum document for this CPU does not contain any simular highlights). Nevertheless, the issue should be dealt properly, but I am a bit out of any ideas on further debugging - it is not clear scheduler lockup and the running state displayed by the OS is nothing more than a side effect of this bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html