Hi Sebastian, You wrote: > One thing on the bisect. The git tree has the patches in this order: > (1) kernel: migrate_disable() do fastpath in atomic & irqs-off > (2) kernel: softirq: unlock with irqs on > > but you need apply Patch #2 before #1. So if you bisect and you hit > warnings due to #1 please note that need apply #2. > > T01 and T02 show probably the same issue but there are too many > warnings comming in parallel. If this comes from the sched patch due > #1/#2 mix up then don't bisect here or have them both applied. > The call path itself does look special as it would violate the rule > of atomic locking / unlocking (as it was fixed in #2 for instance). > At this point I assume that your bisect went wrong due to patch > #1/#2. The traces were produced using the original 3.18.29-rt30 kernel (with all patches) plus the addition of WARN_ON_ONCE(p->migrate_disable_atomic <= 0) in migrate_enable() and CONFIG_SCHED_DEBUG=y. When I revert only patch #1, from the 3.18.29-rt30 kernel, the kernel never crashes. I've been performing long-running tests on a dual Xeon system and a quad-core i7 system with patch #1 reverted. Cheers, Carol > -----Original Message----- > From: Sebastian Andrzej Siewior [mailto:bigeasy@xxxxxxxxxxxxx] > Sent: Thursday, September 08, 2016 6:45 AM > To: Carol Wong > Cc: linux-rt-users@xxxxxxxxxxxxxxx; David Hauck; Preston Hauck > Subject: Re: v3.18-RT > > On 2016-08-19 00:41:46 [+0000], Carol Wong wrote: > > Hi Sebastian, > Hi Carol, > > > Were you able to gain any insight from the traces? > > not really. T00 shows a fault in > [ 2.756284] BUG: unable to handle kernel NULL pointer dereference > at 00000004 > [ 2.756289] IP: [<c11653e7>] kmem_cache_alloc+0x87/0x230 > from ida_pre_get() / create_worker(). That is quite late so I have no > idea why that would happen. > The other two are not really help full. > > > If we were to proceed with reverting the kernel/sched/core.c patch > in our build of 3.18.29-rt30, would the addition of the > WARN_ON_ONCE(p->migrate_disable_atomic <= 0) debug check that you > recommended (2016/07/29) be sufficient for detecting imbalances? We > would perform extended testing on multiple systems to determine the > effects of reverting the patch. > > One thing on the bisect. The git tree has the patches in this order: > (1) kernel: migrate_disable() do fastpath in atomic & irqs-off > (2) kernel: softirq: unlock with irqs on > > but you need apply Patch #2 before #1. So if you bisect and you hit > warnings due to #1 please note that need apply #2. > > T01 and T02 show probably the same issue but there are too many > warnings comming in parallel. If this comes from the sched patch due > #1/#2 mix up then don't bisect here or have them both applied. > The call path itself does look special as it would violate the rule > of atomic locking / unlocking (as it was fixed in #2 for instance). > At this point I assume that your bisect went wrong due to patch > #1/#2. > > > Cheers, > > Carol > > > Sebastian ��.n��������+%������w��{.n�����{�����ǫ���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f