RE: v3.18-RT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sebastian,

Were you able to gain any insight from the traces?

If we were to proceed with reverting the kernel/sched/core.c patch in our build of 3.18.29-rt30, would the addition of the WARN_ON_ONCE(p->migrate_disable_atomic <= 0) debug check that you recommended (2016/07/29) be sufficient for detecting imbalances? We would perform extended testing on multiple systems to determine the effects of reverting the patch.

Cheers,
Carol

> -----Original Message-----
> From: Carol Wong
> Sent: Wednesday, August 03, 2016 6:32 PM
> To: 'Sebastian Andrzej Siewior'
> Cc: linux-rt-users@xxxxxxxxxxxxxxx; David Hauck; Preston Hauck
> Subject: RE: v3.18-RT
> 
> Hi Sebastian,
> 
> I made the suggested change to sched/core.c and verified that
> CONFIG_SCHED_DEBUG=y. I reproduced the crash 3 times and captured the
> attached traces.
> 
> Thanks,
> Carol
> 
> > -----Original Message-----
> > From: Sebastian Andrzej Siewior [mailto:bigeasy@xxxxxxxxxxxxx]
> > Sent: Friday, July 29, 2016 9:20 AM
> > To: Carol Wong
> > Cc: linux-rt-users@xxxxxxxxxxxxxxx; David Hauck; Preston Hauck
> > Subject: Re: v3.18-RT
> >
> > * Carol Wong | 2016-07-20 20:53:21 [+0000]:
> >
> > >Hi Sebastian,
> > Hi Carol,
> >
> > >We finally traced the boot-up crash to the following patch in
> > kernel/sched/core.c:
> > >
> > >https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-
> > rt.git/com
> > >mit/?h=v3.18-rt&id=62044e554f14547061afcfef7f0aceda43e28982
> > >
> > >After reverting the two-line patch in 3.18.29-rt30, the crash no
> > longer occurs on our dual Xeon (2x12 core) system.
> > >
> > >Other observations:
> > >- Does not reproduce on single processor (2 and 4 core) systems
> > >- Reproduces under 3.18.27-rt27 and 3.18.36-rt38 on the dual Xeon
> > >- Does not reproduce on 3.18.27-rt26 and earlier on the dual Xeon
> > >- Reproduces more frequently on .29-rt30 (1 in 20 reboots)
> compared
> > to
> > >.27-rt27 (1 in 100 reboots)
> > >
> > >So far we've not observed any side effects after reverting this
> > patch.
> >
> > This was part of CPU hotplug fixups. Lockdep might be broken
> without
> > it but I am not sure if is most of the time the case or just during
> > hotplug.
> >
> > >I understand that a high core count system may not be easy to come
> > by, so if there are diagnostics or patches you would like to try on
> > the dual Xeon system, we can assist with that.
> >
> > With that patch, migrate_disable() skips the whole preempt-lazy +
> > pin-cpu code if called with IRQs off. Since interrupts are disabled
> we
> > can't migrate to another so it is a possible optimsation.
> > It only makes a difference if migrate_disable() + migrate_enable()
> > calls are not in balance. The commit
> >   https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-
> > rt.git/commit/?h=v3.18-
> rt&id=8d51d3a296b6ec4aebd0d6d7e1b7162cd9bf6662
> > is one example where I fixed the inbalance.
> > Do you get additional backtraces with CONFIG_SCHED_DEBUG enabled?
> >
> > There is one thing the debug code does not cover, so could you
> please
> > add this chunk?
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c index
> > 140ee06079b6..1f8613f77598 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3229,6 +3229,7 @@ void migrate_enable(void)
> >
> >  	if (in_atomic() || irqs_disabled()) {  #ifdef
> CONFIG_SCHED_DEBUG
> > +		WARN_ON_ONCE(p->migrate_disable_atomic <= 0);
> >  		p->migrate_disable_atomic--;
> >  #endif
> >  		return;
> >
> > >Cheers,
> > >Carol Wong
> > >NetAcquire Corporation
> >
> > Sebastian
��.n��������+%������w��{.n�����{�����ǫ���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux