Re: [PATCH] sched/fair: initialize throttle_count for new task-groups lazily

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16.06.2016 20:03, bsegall@xxxxxxxxxx wrote:
Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> writes:

Cgroup created inside throttled group must inherit current throttle_count.
Broken throttle_count allows to nominate throttled entries as a next buddy,
later this leads to null pointer dereference in pick_next_task_fair().

This patch initialize cfs_rq->throttle_count at first enqueue: laziness
allows to skip locking all rq at group creation. Lazy approach also allows
to skip full sub-tree scan at throttling hierarchy (not in this patch).

Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
Cc: Stable <stable@xxxxxxxxxxxxxxx> # v3.2+
---
  kernel/sched/fair.c  |   19 +++++++++++++++++++
  kernel/sched/sched.h |    2 +-
  2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e83db73..fe809fe169d2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4185,6 +4185,25 @@ static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
  	if (!cfs_bandwidth_used())
  		return;

+	/* synchronize hierarchical throttle counter */
+	if (unlikely(!cfs_rq->throttle_uptodate)) {
+		struct rq *rq = rq_of(cfs_rq);
+		struct cfs_rq *pcfs_rq;
+		struct task_group *tg;
+
+		cfs_rq->throttle_uptodate = 1;
+		/* get closest uptodate node because leaves goes first */
+		for (tg = cfs_rq->tg->parent; tg; tg = tg->parent) {
+			pcfs_rq = tg->cfs_rq[cpu_of(rq)];
+			if (pcfs_rq->throttle_uptodate)
+				break;
+		}
+		if (tg) {
+			cfs_rq->throttle_count = pcfs_rq->throttle_count;
+			cfs_rq->throttled_clock_task = rq_clock_task(rq);
+		}
+	}
+

Checking just in enqueue is not sufficient - throttled_lb_pair can check
against a cfs_rq that has never been enqueued (and possibly other
paths).

Looks like this is minor problem: in worst case load-balancer will migrate
task into throttled hierarchy. And this could happens only once for each
newly created group.


It might also make sense to go ahead and initialize all the cfs_rqs we
skipped over to avoid some n^2 pathological behavior. You could also use
throttle_count == -1 or < 0. (We had our own version of this patch that
I guess we forgot to push?)

n^2 shouldn't be a problem while this happens only once for each group.

throttle_count == -1 could be overwritten when parent throttles/unthrottles
before initialization. We could set it to INT_MIN/2 and check <0 but this
will hide possible bugs here. One more int in the same cacheline shouldn't
add noticeable overhead.

I've also added this into our kernel to catch such problems without crash.
Probably it's worth to add into upstream because stale buddy is a real pain)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5506,8 +5506,11 @@ static void set_last_buddy(struct sched_entity *se)
        if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
                return;

-       for_each_sched_entity(se)
+       for_each_sched_entity(se) {
+               if (WARN_ON_ONCE(!se->on_rq))
+                       return;
                cfs_rq_of(se)->last = se;
+       }
 }

 static void set_next_buddy(struct sched_entity *se)
@@ -5515,8 +5518,11 @@ static void set_next_buddy(struct sched_entity *se)
        if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
                return;

-       for_each_sched_entity(se)
+       for_each_sched_entity(se) {
+               if (WARN_ON_ONCE(!se->on_rq))
+                       return;
                cfs_rq_of(se)->next = se;
+       }
 }

 static void set_skip_buddy(struct sched_entity *se)




  	/* an active group must be handled by the update_curr()->put() path */
  	if (!cfs_rq->runtime_enabled || cfs_rq->curr)
  		return;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f3087b04..7cbeb92a1cb9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -437,7 +437,7 @@ struct cfs_rq {

  	u64 throttled_clock, throttled_clock_task;
  	u64 throttled_clock_task_time;
-	int throttled, throttle_count;
+	int throttled, throttle_count, throttle_uptodate;
  	struct list_head throttled_list;
  #endif /* CONFIG_CFS_BANDWIDTH */
  #endif /* CONFIG_FAIR_GROUP_SCHED */


--
Konstantin
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]