Re: XP machine freeze

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 19/04/15 23:48, Nadav Amit wrote:
Brad Campbell <lists2009@xxxxxxxxxxxxxxx> wrote:

On 13/04/15 22:02, Paolo Bonzini wrote:
On 13/04/2015 14:45, Brad Campbell wrote:
G'day Paolo,

Yes, on AMD and I've tried hard to reproduce it on Intel and been unable
to thus far.

Now you mention it may be AMD specific, I have a spare motherboard and
processor sitting in a drawer. I'll bolt it together tomorrow and see if
I can reproduce it on another AMD machine. Two machines should let me
test it twice as fast.

I got a fail this afternoon, so I'm due to reboot tonight. I'll just
revert that one suspect commit from a known bad kernel and see if that
cleans it up. If not then I'll work through the remainder of the
information in your mail. I really appreciate the attention you've paid
to this, it has been a frustrating bug for me because I'm in a position
of not knowing what I don't know, and obviously doing something wrong in
very long bisection processes.
Actually, if you have time to change your course of action, please
revert the one that Nadav pointed out (f210f7572bed, KVM: x86:
Fix lost interrupt on irr_pending race) or cherry-pick it on top of 3.17.

Paolo
Ok, I think we have a winner. Patch manually plopped on top of vanilla 3.17. It has never gone for anywhere near this long on a bad kernel.

brad@srv:~$ uptime
23:24:48 up 6 days,  1:01,  3 users,  load average: 1.48, 1.95, 2.48

So this patch went into the kernel during the 3.19 release cycle? Affected kernels 3.16-3.18?
Actually, the original bug seemed to be introduced by commit
33e4c68656a2e461b296ce714ec322978de85412 "KVM: Optimize searching for
highest IRR”. So the bug goes all the way back to 2.6.32. The race that this
patch fixes just became more apparent (i.e., likely to happen) on 3.16. It
is fixed in 3.19.

And I can confidently state that over the years I've seen this happen a number of times, but in each case I was using qemu with an SDL console as a user-interactive VM, and a moving the mouse would restore network connectivity. It was obviously seriously exacerbated by something that went into 3.16.

I really appreciate the assistance in pinning this down. At the next excuse for a reboot I'll upgrade the server to a 3.19.x kernel and call it done.

Regards,
Brad

--
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux