On 2016-08-19 00:41:46 [+0000], Carol Wong wrote: > Hi Sebastian, Hi Carol, > Were you able to gain any insight from the traces? not really. T00 shows a fault in [ 2.756284] BUG: unable to handle kernel NULL pointer dereference at 00000004 [ 2.756289] IP: [<c11653e7>] kmem_cache_alloc+0x87/0x230 from ida_pre_get() / create_worker(). That is quite late so I have no idea why that would happen. The other two are not really help full. > If we were to proceed with reverting the kernel/sched/core.c patch in our build of 3.18.29-rt30, would the addition of the WARN_ON_ONCE(p->migrate_disable_atomic <= 0) debug check that you recommended (2016/07/29) be sufficient for detecting imbalances? We would perform extended testing on multiple systems to determine the effects of reverting the patch. One thing on the bisect. The git tree has the patches in this order: (1) kernel: migrate_disable() do fastpath in atomic & irqs-off (2) kernel: softirq: unlock with irqs on but you need apply Patch #2 before #1. So if you bisect and you hit warnings due to #1 please note that need apply #2. T01 and T02 show probably the same issue but there are too many warnings comming in parallel. If this comes from the sched patch due #1/#2 mix up then don't bisect here or have them both applied. The call path itself does look special as it would violate the rule of atomic locking / unlocking (as it was fixed in #2 for instance). At this point I assume that your bisect went wrong due to patch #1/#2. > Cheers, > Carol > Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html