Re: WARNING: CPU: 1 PID: 0 at kernel/time/tick-broadcast.c:668 tick_broadcast_oneshot_control+0x17d/0x190()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 11 Feb 2014, Stanislaw Gruszka wrote:

> On Mon, Feb 10, 2014 at 07:59:39PM +0100, poma wrote:
> > On 10.02.2014 11:06, Thomas Gleixner wrote:
> > > On Mon, 10 Feb 2014, poma wrote:
> > > 
> > >> [   83.558551]  [<ffffffff81025b17>] amd_e400_idle+0x87/0x130
> > > 
> > > So this seems to happen only on AMD machines which use that e400 idle
> > > mode. I have no idea at the moment whats wrong there. I'll find one of
> > > those machines and try to reproduce.
> 
> I tried to debug that warn as well. Even if I found machine with proper
> family and model number, HW C1E bug do not happen there, hence I just
> hack kernel to always use amd_e400_idle (and remove AMD rdmsr specific
> instructions to do not crash). That make issue 100% reproducible when
> suspend/resume.

It's also reproducible on cpu online/offline.
 
> It happens when cpu become idle, call CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
> but before CLOCK_EVT_NOTIFY_BROADCAST_EXIT, interrupt trigger on that
> cpu. IRQ is handled by hrtimer code, which want to switch to hres and
> call:
> 
> tick_switch_to_oneshot() -> ... -> tick_broadcast_setup_oneshot()
> 
> Since we have already proper handler there, last procedure clear
> tick_broadcast_oneshot_mask, but tick_broadcast_pending_mask stay
> set. When amd_e400_idle next time call CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
> the warning will happen.
> 
> I came with a below patch, which also clear pending mask, but perhaps

Fun. I came up with the exact same solution independent of you and I
tested it on real C1E contaminated hardware.

> oneshot_mask should not be cleared on tick_broadcast_setup_oneshot(),
> or should be cleared only conditionally, or some other solution is

We can do it unconditionally. It creates consistent state in all
corner cases.

There are other solutions to the problem, but that needs a major
rework of the broadcast code. I so wish that this mess would have
never been necessary at all ...

Thanks,

	tglx
_______________________________________________
kernel mailing list
kernel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/kernel





[Index of Archives]     [Fedora General Discussion]     [Older Fedora Users Archive]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Announce]     [Fedora Package Review]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Coolkey]     [Yum Users]     [Tux]     [Yosemite News]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [USB]     [Asterisk PBX]

  Powered by Linux