Re: set_schedattr + cpuset issue

Juri Lelli <juri.lelli@xxxxxxxxx> · Tue, 2 Sep 2014 11:36:08 +0100

Hi Vincent and Kevin,

thanks a lot for your report. I was also actually experiencing something that I
think is related to your issue, but then I didn't find any time to send out
a proper patch :/.

Could you please test what I've attached and see if it fixes your problem?

Thanks a lot,

- Juri

On 28 August 2014 22:07, Vincent Legout <vincent@xxxxxxxxxxx> wrote:
> Hello,
>
> Juri Lelli <juri.lelli@xxxxxxxxx> writes:
>
>> On Wed, 2 Jul 2014 17:08:47 -0400
>> Kevin Burns <kevinpb@xxxxxx> wrote:
>>
>>> Here's the issue:
>>>
>>> I am able to allocate a bandwidth with a ratio of .1 to two processes using
>>> the sched_setattr() system call.
>>>
>>> I then am able to add said tasks to a cpuset (with one physical processor)
>>> using cset.
>>>
>>> However, when I then try to update the runtime or period of either task,
>>> sched_setattr returns a -EBUSY error.
>>>
>>> Now, if I repeat the above experiment with just one task, I am able to
>>> update the runtime or period without issue. I ran trace-cmd and kernelshark
>>> to verify that the bandwidths were indeed being updated correctly. That and
>>> htop was reporting a higher percentage of CPUusage, which correlated to the
>>> ratios of my task's bandwidth.
>>>
>>> Any ideas as to why cpuset would cause this behaviour?
>>>
>>
>> Could you create a script that I can use to run your setup and reproduce
>> the problem?
>
> Sorry for the delayed answer. I'm working with Kevin and the problem can
> be reproduced using the attached files, also available here:
>
>  http://legout.info/~vincent/sd/
>
> On a Ubuntu 14.04 system running Linux 3.16, when running run.sh for the
> 2nd time, the 2nd call to sched_setattr() returns EBUSY. Uncommenting
> line 41 of run.sh fixes this by returning to SCHED_OTHER before moving
> the task to the cpuset.
>
> The problem arises when using both cpusets and SCHED_DEADLINE. The
> process described in section 5.1 of the SCHED_DEADLINE documentation
> works fine if the process stays on the same cpuset, but I think their
> are some issues when moving a process already in the SCHED_DEADLINE
> policy from one cpuset to another.
>
> According to our experiments, it seems that some fields are not updated
> during this process, and it thus fails. When a task moves from one
> cpuset to another, the total_bw fields of both cpusets doesn't seem to
> be updated. Thus, in the next sched_setattr() call, __dl_overflow()
> returns 1 because it thinks total_bw is 0 in the new cpuset. Then,
> dl_overflow() returns -1 and we have a EBUSY error.
>
> The total_bw field may also overflow because __dl_clear and __dl_add are
> called while the task whose bandwidth is tsk_bw is not in the cpu
> represented by dl_b.
>
> We can get around this by moving the process back to another scheduling
> policy before moving it to another cpuset. But we also had to apply the
> following patch in order to be sure that the bandwith is always updated
> (on top of v3.16). I'd think this condition has been added to skip all
> the tests if the bandwith doesn't change. But there is an issue because
> then, the total_bw field is not going to be updated for the new cpu. I'd
> think the problem comes from the fact that p->dl.dl_bw is not updated
> when a task leaves or returns the SCHED_DEADLINE policy.
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bc1638b..0df3008 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2031,9 +2031,6 @@ static int dl_overflow(struct task_struct *p, int policy,
>         u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
>         int cpus, err = -1;
>
> -       if (new_bw == p->dl.dl_bw)
> -               return 0;
> -
>         /*
>          * Either if a task, enters, leave, or stays -deadline but changes
>          * its parameters, we may need to update accordingly the total
>
> I hope the above explanations make sense and I didn't miss anything
> trivial. I'd be happy to provide more information or test anything if
> needed.
>
> Thanks,
> Vincent
>
From 2243d25fbf0df112a83c12ca8a05e4384ea17058 Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli@xxxxxxx>
Date: Wed, 11 Jun 2014 16:37:31 +0200
Subject: [PATCH] sched/deadline: clear dl_entity params when setscheduling to
 different class

When a task is using SCHED_DEADLINE and the user setschedules it to a different
class its sched_dl_entity static parameters are not cleaned up. This causes a
bug if the user sets it back to SCHED_DEADLINE with the same parameters again.
The problem resides in the check we perform at the very beginning of
dl_overflow():

	if (new_bw == p->dl.dl_bw)
		return 0;

This condition is met in the case depicted above, so the function returns and
dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). After this,
admission control is broken.

This patch fixes the thing, properly clearing static parameters for a task
that ceases to use SCHED_DEADLINE.

Reported-by: Daniele Alessandrelli <daniele.alessandrelli@xxxxxxxxx>
Signed-off-by: Juri Lelli <juri.lelli@xxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Juri Lelli <juri.lelli@xxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
---
 kernel/sched/core.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec1a286..01738b2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1776,6 +1776,21 @@ int wake_up_state(struct task_struct *p, unsigned int state)
 }
 
 /*
+ * This function clears the sched_dl_entity static params.
+ */
+static void
+__dl_clear_params(struct task_struct *p)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	dl_se->dl_runtime = 0;
+	dl_se->dl_deadline = 0;
+	dl_se->dl_period = 0;
+	dl_se->flags = 0;
+	dl_se->dl_bw = 0;
+}
+
+/*
  * Perform scheduler related setup for a newly forked process p.
  * p is forked by current.
  *
@@ -1799,10 +1814,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 
 	RB_CLEAR_NODE(&p->dl.rb_node);
 	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	p->dl.dl_runtime = p->dl.runtime = 0;
-	p->dl.dl_deadline = p->dl.deadline = 0;
-	p->dl.dl_period = 0;
-	p->dl.flags = 0;
+	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
 
@@ -2060,6 +2072,7 @@ static int dl_overflow(struct task_struct *p, int policy,
 		err = 0;
 	} else if (!dl_policy(policy) && task_has_dl_policy(p)) {
 		__dl_clear(dl_b, p->dl.dl_bw);
+		__dl_clear_params(p);
 		err = 0;
 	}
 	raw_spin_unlock(&dl_b->lock);
-- 
2.0.4