On Mon, 2007-07-23 at 09:08 -0700, Daniel Walker wrote: > On Sat, 2007-07-21 at 23:07 +0100, Rui Nuno Capela wrote: > > > Call Trace: > > [<c010622a>] show_trace_log_lvl+0x1a/0x30 > > [<c01062f6>] show_stack_log_lvl+0xb6/0xe0 > > [<c0106521>] show_registers+0x201/0x330 > > [<c0106768>] die+0x118/0x260 > > [<c0304233>] do_page_fault+0x193/0x600 > > [<c030294a>] error_code+0x72/0x78 > > [<c011b4af>] activate_task+0x4f/0xb0 > > [<c011e09d>] try_to_wake_up+0x2bd/0x420 > > [<c011e279>] wake_up_process_mutex+0x19/0x20 > > [<c01425cc>] wakeup_next_waiter+0xec/0x1a0 > > [<c030173c>] rt_spin_lock_slowunlock+0x4c/0x70 > > [<c0301ff6>] rt_spin_unlock+0x26/0x30 > > [<c015b3e4>] put_zone_pcp+0x14/0x20 > > [<c015c265>] get_page_from_freelist+0x145/0x380 > > [<c015c4f4>] __alloc_pages+0x54/0x2d0 > > [<c01652bd>] __handle_mm_fault+0x7dd/0x9a0 > > [<c0304398>] do_page_fault+0x2f8/0x600 > > [<c030294a>] error_code+0x72/0x78 > > ======================= > > I was able to reproduce a similar looking hang when I combine kernbench > running with another load (I used ltpstress.sh from LTP) .. > > I'm debugging it now .. It looks like sched_class->enqueue_task() is NULL and that's why the system hangs .. The reason why that happens is because check_pgt_cache() is called from the idle thread, and with PREEMPT_RT check_pgt_cache() locks at least one mutex .. Once the idle thread is on a wait_list, as soon as it's woke by the mutex owner the system will crash in enqueue_task. Since the idle thread has a NULL sched_class->enqueue_task .. check_pgt_cache() is already getting called from the desched_thread() , so I think it could just be removed from i386 cpu_idle(). Anyone have comments on the theory above? Daniel - To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html