Re: Strange CPU usage pattern in SMP guest

Sebastian Hetze <s.hetze@xxxxxxxxxxxx> · Tue, 30 Mar 2010 10:27:43 +0200

On Tue, Mar 23, 2010 at 06:18:08PM -0300, Marcelo Tosatti wrote:
> On Mon, Mar 22, 2010 at 01:51:20PM +0100, Sebastian Hetze wrote:
> > On Sun, Mar 21, 2010 at 05:17:38PM +0200, Avi Kivity wrote:
> > > On 03/21/2010 04:55 PM, Sebastian Hetze wrote:
> > >> On Sun, Mar 21, 2010 at 02:19:40PM +0200, Avi Kivity wrote:
> > >>    
> > >>> On 03/21/2010 02:02 PM, Sebastian Hetze wrote:
> > >>>      
> > >>>> 12:46:02     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
> > >>>> 12:46:03     all    0,20   11,35   10,96    8,96    0,40    2,99    0,00    0,00   65,14
> > >>>> 12:46:03       0    1,00   11,00    7,00   15,00    0,00    1,00    0,00    0,00   65,00
> > >>>> 12:46:03       1    0,00    7,14    2,04    6,12    1,02   11,22    0,00    0,00   72,45
> > >>>> 12:46:03       2    0,00   15,00    1,00   12,00    0,00    1,00    0,00    0,00   71,00
> > >>>> 12:46:03       3    0,00   11,00   23,00    8,00    0,00    0,00    0,00    0,00   58,00
> > >>>> 12:46:03       4    0,00    0,00   50,00    0,00    0,00    0,00    0,00    0,00   50,00
> > >>>> 12:46:03       5    0,00   13,00   20,00    4,00    0,00    1,00    0,00    0,00   62,00
> > >>>>
> > >>>> So it is only CPU4 that is showing this strange behaviour.
> > >>>>
> > >>>>        
> > >>> Can you adjust irqtop to only count cpu4?  or even just post a few 'cat
> > >>> /proc/interrupts' from that guest.
> > >>>
> > >>> Most likely the timer interrupt for cpu4 died.
> > >>>      
> > >> I've added two keys +/- to your irqtop to focus up and down
> > >> in the row of available CPUs.
> > >> The irqtop for CPU4 shows a constant number of 6 local timer interrupts
> > >> per update, while the other CPUs show various higher values:
> > >>
> > >> irqtop for cpu 4
> > >>
> > >>   eth0                                      188
> > >>   Rescheduling interrupts                   162
> > >>   Local timer interrupts                      6
> > >>   ata_piix                                    3
> > >>   TLB shootdowns                              1
> > >>   Spurious interrupts                         0
> > >>   Machine check exceptions                    0
> > >>
> > >>
> > >> irqtop for cpu 5
> > >>
> > >>   eth0                                      257
> > >>   Local timer interrupts                    251
> > >>   Rescheduling interrupts                   237
> > >>   Spurious interrupts                         0
> > >>   Machine check exceptions                    0
> > >>
> > >> So the timer interrupt for cpu4 is not completely dead but somehow
> > >> broken.
> > >
> > > That is incredibly weird.
> > >
> > >> What can cause this problem? Any way to speed it up again?
> > >>    
> > >
> > > The host has 8 cpus and is only running this 6 vcpu guest, yes?
> > >
> > > Can you confirm the other vcpus are ticking at 250 Hz?
> > >
> > > What does 'top' show running on cpu 4?  Pressing 'f' 'j' will add a  
> > > last-used-cpu field in the display.
> > >
> > > Marcelo, any ideas?
> > 
> > Just to let you know, right after startup, all vcpus work fine.
> > 
> > The following message might be related to the problem:
> > hrtimer: interrupt too slow, forcing clock min delta to 165954639 ns
> > 
> > The guest is an 32bit system running on an 64bit host.
> 
> Sebastian,
> 
> Please apply the attached patch to your guest kernel.
> 

With this patch applied, the system runs without hrtimer messages since
5 days and the timer iterrupts look fine. However, I had this
Clocksource tsc unstable (delta = -4398046474878 ns) message that I
reported on Sunday.

Actually, when restarting the system with the hrtimer patch applied,
we also changed the BIOS setting to disable Intel SmartStep on the host.
Since there are no hrtimer messages at all, it might be that the SmartStep
CPU frequency adjustment is the real cause for the slow interrupts in
the KVM guest. Anyone else experienced these problems?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html