Re: [PATCH 2/2] lib/percpu_counter: fix dying cpu compare race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 04, 2023 at 09:42:06AM +0800, Ye Bin wrote:
> From: Ye Bin <yebin10@xxxxxxxxxx>
> 
> In commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") a race
> condition between a cpu dying and percpu_counter_sum() iterating online CPUs
> was identified.
> Acctually, there's the same race condition between a cpu dying and
> __percpu_counter_compare(). Here, use 'num_online_cpus()' for quick judgment.
> But 'num_online_cpus()' will be decreased before call 'percpu_counter_cpu_dead()',
> then maybe return incorrect result.
> To solve above issue, also need to add dying CPUs count when do quick judgment
> in __percpu_counter_compare().

Not sure I completely understood the race you are describing. All CPU
accounting is protected with percpu_counters_lock. Is it a real race
that you've faced, or hypothetical? If it's real, can you share stack
traces?
 
> Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
> ---
>  lib/percpu_counter.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> index 5004463c4f9f..399840cb0012 100644
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -227,6 +227,15 @@ static int percpu_counter_cpu_dead(unsigned int cpu)
>  	return 0;
>  }
>  
> +static __always_inline unsigned int num_count_cpus(void)

This doesn't look like a good name. Maybe num_offline_cpus?

> +{
> +#ifdef CONFIG_HOTPLUG_CPU
> +	return (num_online_cpus() + num_dying_cpus());

               ^                                    ^ 
         'return' is not a function. Braces are not needed

Generally speaking, a sequence of atomic operations is not an atomic
operation, so the above doesn't look correct. I don't think that it
would be possible to implement raceless accounting based on 2 separate
counters.

Most probably, you'd have to use the same approach as in 8b57b11cca88:

        lock();
        for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask)
                cnt++;
        unlock();

And if so, I'd suggest to implement cpumask_weight_or() for that.

> +#else
> +	return num_online_cpus();
> +#endif
> +}
> +
>  /*
>   * Compare counter against given value.
>   * Return 1 if greater, 0 if equal and -1 if less
> @@ -237,7 +246,7 @@ int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch)
>  
>  	count = percpu_counter_read(fbc);
>  	/* Check to see if rough count will be sufficient for comparison */
> -	if (abs(count - rhs) > (batch * num_online_cpus())) {
> +	if (abs(count - rhs) > (batch * num_count_cpus())) {
>  		if (count > rhs)
>  			return 1;
>  		else
> -- 
> 2.31.1




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux