On 24.01.2009, at 14:06, Marcelo Tosatti wrote:
On Sat, Jan 24, 2009 at 08:42:06AM +0100, Alexander Graf wrote:
rarely now). You can use the no_timer_check kernel option to bypass
it.
Ok :-). Thanks. The logic in the kernel for this is really stupid
(basing timing on clock speed). What about disabling the check if we
detect KVM?
Yes, this is an option. We've talked about it before, but no patch was
merged. The RHEL5.3 kernel skips those checks when it detects VMWare
or KVM hypervisors.
That sounds clever. But I doubt I'll get anything as intrusive into
the SLES11 kernel at this point in time :-(.
We should understand what is happening to fix the fullvirt/old guest
case. For the in-kernel PIT, I believe there is a bug somewhere,
either
in PIT itself or in the interaction with IOAPIC (failure to inject
interrupts for some reason). I started debugging it by constantly
reboot'ing an SMP guest but my testbox died. Hope to get back to it
soon.
Hm. If I ever get tracing working again, I can try to create one
too :-).
The "Stuck ??" messages seem to be coming from smpboot.c. So for
some
reason vcpu's are being reset. Don't seem to be a triple fault
because
in that case all vcpu's would be reset (so yes, the vcpu was
really on
BIOS code).
Hm. I know that OSX turns off CPUs it doesn't need as an
alternative to
deep-sleep. Does Linux do that too?
Not that I know of, unless you offline CPU's manually, which does not
seem to be the case.
Nope, I don't hotplug anything (though the acpihp module is loaded).
Suggest the following:
- Confirm the problem happens with root on ext3 filesystem (can't
you
mount the CIFS and copy the data over to a local guest disk to
simulate similar load?).
I had Stuck ?? messages without networking, but if it helps I can try
that too. In the project we're using this for we do things over
cifs, so
that's why I built the test case around it.
OK. Just trying to decrease the variables involved. I'll setup a
machine
to run a similar load next week.
Sounds good :-). I put all the files I tested with online with a link
in the first mail of this thread. So feel free to take that as an
inspiration. For non-network testing I simply put -net none there, but
still had the initrd boot and kill the machine.
Also, you mentioned "other reports" previously, can you point to
them,
please?
Yes, will do later. I gotta run now! Thanks for the reply - it's
good to
know this isn't getting ignored :-).
Have a good weekend.
Same to you. I was running for a first-aid course though, not the
weekend :-).
I was mainly talking here about the thread "Guest Hang Bugs". Though
with 2.6.25 guests I did get "BUG: soft lockup - CPU#x stuck for ns!"
messages instead of the "Stuck ??" FWIW.
Originally I created the whole test case to debug this exact bug we
encountered as well: http://article.gmane.org/gmane.comp.emulators.kvm.devel/21828/
Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html