Re: KVM guest crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 24.01.2009, at 14:06, Marcelo Tosatti wrote:

On Sat, Jan 24, 2009 at 08:42:06AM +0100, Alexander Graf wrote:
rarely now). You can use the no_timer_check kernel option to bypass
it.

Ok :-). Thanks. The logic in the kernel for this is really stupid
(basing timing on clock speed). What about disabling the check if we
detect KVM?

Yes, this is an option. We've talked about it before, but no patch was
merged. The RHEL5.3 kernel skips those checks when it detects VMWare
or KVM hypervisors.

That sounds clever. But I doubt I'll get anything as intrusive into the SLES11 kernel at this point in time :-(.

We should understand what is happening to fix the fullvirt/old guest
case. For the in-kernel PIT, I believe there is a bug somewhere, either
in PIT itself or in the interaction with IOAPIC (failure to inject
interrupts for some reason). I started debugging it by constantly
reboot'ing an SMP guest but my testbox died. Hope to get back to it
soon.

Hm. If I ever get tracing working again, I can try to create one too :-).

The "Stuck ??" messages seem to be coming from smpboot.c. So for some reason vcpu's are being reset. Don't seem to be a triple fault because in that case all vcpu's would be reset (so yes, the vcpu was really on
BIOS code).

Hm. I know that OSX turns off CPUs it doesn't need as an alternative to
deep-sleep. Does Linux do that too?

Not that I know of, unless you offline CPU's manually, which does not
seem to be the case.

Nope, I don't hotplug anything (though the acpihp module is loaded).

Suggest the following:
- Confirm the problem happens with root on ext3 filesystem (can't you
mount the CIFS and copy the data over to a local guest disk to
simulate similar load?).

I had Stuck ?? messages without networking, but if it helps I can try
that too. In the project we're using this for we do things over cifs, so
that's why I built the test case around it.

OK. Just trying to decrease the variables involved. I'll setup a machine
to run a similar load next week.

Sounds good :-). I put all the files I tested with online with a link in the first mail of this thread. So feel free to take that as an inspiration. For non-network testing I simply put -net none there, but still had the initrd boot and kill the machine.


Also, you mentioned "other reports" previously, can you point to them,
please?

Yes, will do later. I gotta run now! Thanks for the reply - it's good to
know this isn't getting ignored :-).

Have a good weekend.

Same to you. I was running for a first-aid course though, not the weekend :-).

I was mainly talking here about the thread "Guest Hang Bugs". Though with 2.6.25 guests I did get "BUG: soft lockup - CPU#x stuck for ns!" messages instead of the "Stuck ??" FWIW. Originally I created the whole test case to debug this exact bug we encountered as well: http://article.gmane.org/gmane.comp.emulators.kvm.devel/21828/

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux