May I ask a question about scheduler (sched_rt.c)?

MING ZHOU <a17711@xxxxxxxxxxxx> · Tue, 1 Nov 2011 11:46:47 +0800

Hi  all,

 May I ask a question about scheduler (sched_rt.c)? I want to make
sure a patch related  in kernel-linux mailing list is valid or not.
 http://lkml.org/lkml/2008/9/25/189

 I encountered a kernel panic recently which caused by BUG_ON in
pick_next_pushable_task ( my kernel version is 2.6.35, on arm
platform).

static struct task_struct *pick_next_pushable_task(struct rq *rq)
{
   ...
     BUG_ON(task_current(rq, p));  <------------ panic here!!!
   ...
}

<4>[17583.180664] [<c00a3888>] (pick_next_pushable_task+0x4c/0xa4)
from [<c00ae21c>] (push_rt_task+0x20/0x264)
<4>[17583.180725] [<c00ae21c>] (push_rt_task+0x20/0x264) from
[<c00ae554>] (post_schedule_rt+0x14/0x20)
<4>[17583.180816] [<c00ae554>] (post_schedule_rt+0x14/0x20) from
[<c069352c>] (schedule+0x738/0x7c8)

 I checked patch history related to push_rt_task, and I think the
following patch may be the reason, since if dequeue task improperly,
it may ruin task pointer by mistake.

 https://lkml.org/lkml/2011/8/14/71
Commit-ID:  311e800e16f63d909136a64ed17ca353a160be59
Author:     Hillf Danton <dhillf@xxxxxxxxx>

sched, rt: Fix rq->rt.pushable_tasks bug in push_rt_task()

Do not call dequeue_pushable_task() when failing to push an eligible
task, as it remains pushable, merely not at this particular moment.

Signed-off-by: Hillf Danton <dhillf@xxxxxxxxx>
Signed-off-by: Mike Galbraith <mgalbraith@xxxxxx>
Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Yong Zhang <yong.zhang0@xxxxxxxxx>

And I also noticed in commit history of sched_rt.c, I found the
similar patch is submitted before at 2008.
However, it was not picked up in latest kernel code.  So, I am
wondering whether this patch is valid?

commit 1563513d34ed4b12ef32bc2adde4a53ce05701a1
Author: Gregory Haskins <ghaskins@xxxxxxxxxx>
Date:   Mon Dec 29 09:39:53 2008 -0500

RT: fix push_rt_task() to handle dequeue_pushable properly

  A panic was discovered by Chirag Jog where a BUG_ON sanity check
  in the new "pushable_task" logic would trigger a panic under
  certain circumstances:

  http://lkml.org/lkml/2008/9/25/189

  Gilles Carry discovered that the root cause was attributed to the
  pushable_tasks list getting corrupted in the push_rt_task logic.
  This was the result of a dropped rq lock in double_lock_balance
  allowing a task in the process of being pushed to potentially migrate
  away, and thus corrupt the pushable_tasks() list.

  I traced back the problem as introduced by the pushable_tasks patch
  that went in recently.   There is a "retry" path in push_rt_task()
  that actually had a compound conditional to decide whether to
  retry or exit.  I missed the meaning behind the rationale for the
  virtual "if(!task) goto out;" portion of the compound statement and
  thus did not handle it properly.  The new pushable_tasks logic
  actually creates three distinct conditions:

  1) an untouched and unpushable task should be dequeued
  2) a migrated task where more pushable tasks remain should be retried
  3) a migrated task where no more pushable tasks exist should exit

  The original logic mushed (1) and (3) together, resulting in the
  system dequeuing a migrated task (against an unlocked foreign run-queue
  nonetheless).

  To fix this, we get rid of the notion of "paranoid" and we support the
  three unique conditions properly.  The paranoid feature is no longer
  relevant with the new pushable logic (since pushable naturally limits
  the loop) anyway, so lets just remove it.

  Reported-By: Chirag Jog <chirag@xxxxxxxxxxxxxxxxxx>
  Found-by: Gilles Carry <gilles.carry@xxxxxxxx>
  Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx>

Best Regards,
Jane Zhou
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html