Re: [PATCH 2/2] lib/percpu_counter: fix dying cpu compare race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2023/4/4 10:50, Yury Norov wrote:
On Tue, Apr 04, 2023 at 09:42:06AM +0800, Ye Bin wrote:
From: Ye Bin <yebin10@xxxxxxxxxx>

In commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") a race
condition between a cpu dying and percpu_counter_sum() iterating online CPUs
was identified.
Acctually, there's the same race condition between a cpu dying and
__percpu_counter_compare(). Here, use 'num_online_cpus()' for quick judgment.
But 'num_online_cpus()' will be decreased before call 'percpu_counter_cpu_dead()',
then maybe return incorrect result.
To solve above issue, also need to add dying CPUs count when do quick judgment
in __percpu_counter_compare().
Not sure I completely understood the race you are describing. All CPU
accounting is protected with percpu_counters_lock. Is it a real race
that you've faced, or hypothetical? If it's real, can you share stack
traces?
Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
---
  lib/percpu_counter.c | 11 ++++++++++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 5004463c4f9f..399840cb0012 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -227,6 +227,15 @@ static int percpu_counter_cpu_dead(unsigned int cpu)
  	return 0;
  }
+static __always_inline unsigned int num_count_cpus(void)
This doesn't look like a good name. Maybe num_offline_cpus?

+{
+#ifdef CONFIG_HOTPLUG_CPU
+	return (num_online_cpus() + num_dying_cpus());
                ^                                    ^
          'return' is not a function. Braces are not needed

Generally speaking, a sequence of atomic operations is not an atomic
operation, so the above doesn't look correct. I don't think that it
would be possible to implement raceless accounting based on 2 separate
counters.
Yes, there is indeed a concurrency issue with doing so here. But I saw that the process was first set up dying_mask and then reduce the number of online CPUs. The total quantity maybe is larger than the actual value and may fall back to a slow path.But this won't cause any problems.


Most probably, you'd have to use the same approach as in 8b57b11cca88:

         lock();
         for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask)
                 cnt++;
         unlock();

And if so, I'd suggest to implement cpumask_weight_or() for that.

+#else
+	return num_online_cpus();
+#endif
+}
+
  /*
   * Compare counter against given value.
   * Return 1 if greater, 0 if equal and -1 if less
@@ -237,7 +246,7 @@ int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
count = percpu_counter_read(fbc);
  	/* Check to see if rough count will be sufficient for comparison */
-	if (abs(count - rhs) > (batch * num_online_cpus())) {
+	if (abs(count - rhs) > (batch * num_count_cpus())) {
  		if (count > rhs)
  			return 1;
  		else
--
2.31.1
.






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux