Hi all, May I ask a question about scheduler (sched_rt.c)? I want to make sure a patch related in kernel-linux mailing list is valid or not. http://lkml.org/lkml/2008/9/25/189 I encountered a kernel panic recently which caused by BUG_ON in pick_next_pushable_task ( my kernel version is 2.6.35, on arm platform). static struct task_struct *pick_next_pushable_task(struct rq *rq) { ... BUG_ON(task_current(rq, p)); <------------ panic here!!! ... } <4>[17583.180664] [<c00a3888>] (pick_next_pushable_task+0x4c/0xa4) from [<c00ae21c>] (push_rt_task+0x20/0x264) <4>[17583.180725] [<c00ae21c>] (push_rt_task+0x20/0x264) from [<c00ae554>] (post_schedule_rt+0x14/0x20) <4>[17583.180816] [<c00ae554>] (post_schedule_rt+0x14/0x20) from [<c069352c>] (schedule+0x738/0x7c8) I checked patch history related to push_rt_task, and I think the following patch may be the reason, since if dequeue task improperly, it may ruin task pointer by mistake. https://lkml.org/lkml/2011/8/14/71 Commit-ID: 311e800e16f63d909136a64ed17ca353a160be59 Author: Hillf Danton <dhillf@xxxxxxxxx> sched, rt: Fix rq->rt.pushable_tasks bug in push_rt_task() Do not call dequeue_pushable_task() when failing to push an eligible task, as it remains pushable, merely not at this particular moment. Signed-off-by: Hillf Danton <dhillf@xxxxxxxxx> Signed-off-by: Mike Galbraith <mgalbraith@xxxxxx> Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> Cc: Yong Zhang <yong.zhang0@xxxxxxxxx> And I also noticed in commit history of sched_rt.c, I found the similar patch is submitted before at 2008. However, it was not picked up in latest kernel code. So, I am wondering whether this patch is valid? commit 1563513d34ed4b12ef32bc2adde4a53ce05701a1 Author: Gregory Haskins <ghaskins@xxxxxxxxxx> Date: Mon Dec 29 09:39:53 2008 -0500 RT: fix push_rt_task() to handle dequeue_pushable properly A panic was discovered by Chirag Jog where a BUG_ON sanity check in the new "pushable_task" logic would trigger a panic under certain circumstances: http://lkml.org/lkml/2008/9/25/189 Gilles Carry discovered that the root cause was attributed to the pushable_tasks list getting corrupted in the push_rt_task logic. This was the result of a dropped rq lock in double_lock_balance allowing a task in the process of being pushed to potentially migrate away, and thus corrupt the pushable_tasks() list. I traced back the problem as introduced by the pushable_tasks patch that went in recently. There is a "retry" path in push_rt_task() that actually had a compound conditional to decide whether to retry or exit. I missed the meaning behind the rationale for the virtual "if(!task) goto out;" portion of the compound statement and thus did not handle it properly. The new pushable_tasks logic actually creates three distinct conditions: 1) an untouched and unpushable task should be dequeued 2) a migrated task where more pushable tasks remain should be retried 3) a migrated task where no more pushable tasks exist should exit The original logic mushed (1) and (3) together, resulting in the system dequeuing a migrated task (against an unlocked foreign run-queue nonetheless). To fix this, we get rid of the notion of "paranoid" and we support the three unique conditions properly. The paranoid feature is no longer relevant with the new pushable logic (since pushable naturally limits the loop) anyway, so lets just remove it. Reported-By: Chirag Jog <chirag@xxxxxxxxxxxxxxxxxx> Found-by: Gilles Carry <gilles.carry@xxxxxxxx> Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx> Best Regards, Jane Zhou -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html