On 02/25/2016 11:09 AM, Linus Torvalds wrote: > On Thu, Feb 25, 2016 at 10:40 AM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote: >> >> The crash itself is in try_to_wake_up() (again, assuming the stacktrace is >> valid). > > No, the crash seems to be off in la-la-land I meant the last-known-good address is try_to_wake_up(); in the same way that RIP @ 0 crashes, but no one says the crash is @ NULL. >, judging by the oops: > > IP: [<ffff88023fd40000>] 0xffff88023fd40000 > > which isn't kernel code at all. It is close to, but not at, the percpu > area you point out. Assuming ffff88023fdc0000 is percpu start for cpu 7 then I'm pretty sure ffff88023fd40000 is percpu start for cpu 6. Either way, RIP is almost certainly in the percpu block. > But yes, the call trace looks accurate and makes sense, we haveL > > tty_flip_buffer_push -> > (queue_work is inline) -> > queue_work_on -> > __queue_work -> > insert_work -> > (wake_up_worker is inlined) > wake_up_process -> try_to_wake_up -> > *insane non-code address* > > but I cannot for the life of me see how we get to an insane address. > It smells like stack corruption when returning from try_to_wake_up() > or something like that. > > Hmm. Actually, try_to_wake_up() will do several indirect calls > (task_waking and select_task_rq, and it_func_ptr->fn for tracing), but > then I'd expect to see try_to_wake_up itself in the stack trace. > Of course, when you jump to la-la-land, crazy things can happen. And > that offending IP is at a page boundary, so it migth have run some > random code on the previous page. > > Quite frankly, neither ->task_waking() nor ->select_task_rq() look > very likely. Agreed, the sched_class indirections do not seem likely. > But the tracepoint stuff is actually fairly dynamic, and > does things like > > it_func_ptr = rcu_dereference_sched((tp)->funcs); > > to get the function pointer information, so if there is some race in > there, anything can happen. > > Jiri, were you messing around with tracing when this happened? Or > maybe shutting down CPU's? There was a RCU locking problem with CPU > shutdown, maybe this is one of the symptoms. The fix for that is > recent, and not in 4.4.2. > > Adding Steven Rostedt to the cc. Steven, does that look like a possible case? > > Linus > -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html