Re: [tip:timers/urgent] tick: Cleanup NOHZ per cpu data on cpu down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 05/12/2013 06:27 AM, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  4b0c0f294f60abcdd20994a8341a95c8ac5eeb96
> Gitweb:     http://git.kernel.org/tip/4b0c0f294f60abcdd20994a8341a95c8ac5eeb96
> Author:     Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> AuthorDate: Fri, 3 May 2013 15:02:50 +0200
> Committer:  Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> CommitDate: Sun, 12 May 2013 12:20:09 +0200
> 
> tick: Cleanup NOHZ per cpu data on cpu down
> 
> Prarit reported a crash on CPU offline/online. The reason is that on
> CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
> up. If at cpu online an interrupt happens before the per cpu tick
> device is registered the irq_enter() check potentially sees stale data
> and dereferences a NULL pointer.
> 
> Cleanup the data after the cpu is dead.

Thomas, while this does fix up the NULL pointer issue, I think you've introduced
a new bug in the schedule timer code.

While doing up and downs on the same CPU, I now occasionally see long delays in
the up and down...

[   65.150073] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   66.715339] smpboot: CPU 19 is now offline
[   67.752751] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   68.758711] smpboot: CPU 19 is now offline

Everything is normal ...

[   69.711612] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   70.731521] smpboot: CPU 19 is now offline

Long delay in bringing CPU "down"

[   81.744565] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   82.848591] smpboot: CPU 19 is now offline

Long delay in bringing CPU "up"

[   89.826533] smpboot: Booting Node 1 Processor 19 APIC 0x28
[   84.905358] smpboot: CPU 19 is now offline
[   87.565274] smpboot: Booting Node 1 Processor 19 APIC 0x28

Also, if the system is in this state I cannot reboot -- the system appears to
hang while bringing down CPUs...

Oddly, if I do

+       memset(ts, 0, sizeof(*ts));
+       ts->tick_stopped = 1;

instead of your memset, everything works.  I'm looking at the tick-sched.c code
to see why setting tick_stopped = 1 seems to fix the problem.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Stable Commits]     [Linux Stable Kernel]     [Linux Kernel]     [Linux USB Devel]     [Linux Video &Media]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux