Re: [PATCH 2/2] sched/deadline: Correctly account for allocated bandwidth during hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/13/24 7:57 AM, Juri Lelli wrote:
For hotplug operations, DEADLINE needs to check that there is still enough
bandwidth left after removing the CPU that is going offline. We however
fail to do so currently.

Restore the correct behavior by restructuring dl_bw_manage() a bit, so
that overflow conditions (not enough bandwidth left) are properly
checked. Also account for dl_server bandwidth, i.e. discount such
bandwidht in the calculation since NORMAL tasks will be anyway moved
away from the CPU as a result of the hotplug operation.

Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxxx>
---
  kernel/sched/core.c     |  2 +-
  kernel/sched/deadline.c | 33 ++++++++++++++++++++++++---------
  kernel/sched/sched.h    |  2 +-
  3 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 43e453ab7e20..d1049e784510 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8057,7 +8057,7 @@ static void cpuset_cpu_active(void)
  static int cpuset_cpu_inactive(unsigned int cpu)
  {
  	if (!cpuhp_tasks_frozen) {
-		int ret = dl_bw_check_overflow(cpu);
+		int ret = dl_bw_deactivate(cpu);
if (ret)
  			return ret;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index e53208a50279..609685c5df05 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3467,29 +3467,31 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
  }
enum dl_bw_request {
-	dl_bw_req_check_overflow = 0,
+	dl_bw_req_deactivate = 0,
  	dl_bw_req_alloc,
  	dl_bw_req_free
  };
static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
  {
-	unsigned long flags;
+	unsigned long flags, cap;
  	struct dl_bw *dl_b;
  	bool overflow = 0;
+	u64 fair_server_bw = 0;
rcu_read_lock_sched();
  	dl_b = dl_bw_of(cpu);
  	raw_spin_lock_irqsave(&dl_b->lock, flags);
- if (req == dl_bw_req_free) {
+	cap = dl_bw_capacity(cpu);
+	switch (req) {
+	case dl_bw_req_free:
  		__dl_sub(dl_b, dl_bw, dl_bw_cpus(cpu));
-	} else {
-		unsigned long cap = dl_bw_capacity(cpu);
-
+		break;
+	case dl_bw_req_alloc:
  		overflow = __dl_overflow(dl_b, cap, 0, dl_bw);
- if (req == dl_bw_req_alloc && !overflow) {
+		if (!overflow) {
  			/*
  			 * We reserve space in the destination
  			 * root_domain, as we can't fail after this point.
@@ -3498,6 +3500,19 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
  			 */
  			__dl_add(dl_b, dl_bw, dl_bw_cpus(cpu));
  		}
+		break;
+	case dl_bw_req_deactivate:
+		/*
+		 * cpu is going offline and NORMAL tasks will be moved away
+		 * from it. We can thus discount dl_server bandwidth
+		 * contribution as it won't need to be servicing tasks after
+		 * the cpu is off.
+		 */
+		if (cpu_rq(cpu)->fair_server.dl_server)
+			fair_server_bw = cpu_rq(cpu)->fair_server.dl_bw;
+
+		overflow = __dl_overflow(dl_b, cap, fair_server_bw, 0);
+		break;

This part can still cause a failure in one of test cases in my cpuset partition test script. In this particular case, the CPU to be offlined is an isolated CPU with scheduling disabled. As a result, total_bw is 0 and the __dl_overflow() test failed. Is there a way to skip the __dl_overflow() test for isolated CPUs? Can we use a null total_bw as a proxy for that?

Thanks,
Longman





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux