Re: [PATCH 1/7] MIPS: sync-r4k: Rework to be many cores firendly

peterz@xxxxxxxxxxxxx · Mon, 17 Aug 2020 09:55:33 +0200

On Mon, Aug 17, 2020 at 11:46:40AM +0800, Jiaxun Yang wrote:
> Here we reworked the whole procdure. Now the synchronise event on CPU0
> is triggered by smp call function, and we won't touch the count on CPU0
> at all.

Are you telling me, that in 2020 you're building chips that need
horrible crap like this ?!?

> +#define MAX_LOOPS	1000
> +
> +void synchronise_count_master(void *unused)
>  {
>  	unsigned long flags;
> +	long delta;
> +	int i;
>  
> +	if (atomic_read(&sync_stage) != STAGE_START)
> +		BUG();

	BUG_ON(atomic_read(&sync_state) != STAGE_START);

>  
>  	local_irq_save(flags);

That's silly, replace with: lockdep_assert_hardirqs_disabled().

>  
> +	cur_count = read_c0_count();
> +	smp_wmb();
> +	atomic_inc(&sync_stage); /* inc to STAGE_MASTER_READY */

memory barriers require a comment that describes the ordering. This
includes at least 2 variables and at least 2 code paths (*) -- afaict
your code does NOT have a matching barrier, see below.

>  
> +	for (i = 0; i < MAX_LOOPS; i++) {
> +		cur_count = read_c0_count();
>  		smp_wmb();
> -		atomic_inc(&count_count_stop);
> +		if (atomic_read(&sync_stage) == STAGE_SLAVE_SYNCED)
> +			break;
>  	}
> +
> +	delta = read_c0_count() - fini_count;
>  
>  	local_irq_restore(flags);
>  
> +	if (i == MAX_LOOPS)
> +		pr_err("sync-r4k: Master: synchronise timeout\n");
> +	else
> +		pr_info("sync-r4k: Master: synchronise succeed, maximum delta: %ld\n", delta);
> +
> +	return;
>  }
>  
>  void synchronise_count_slave(int cpu)
>  {
>  	int i;
>  	unsigned long flags;
> +	call_single_data_t csd;
>  
> +	raw_spin_lock(&sync_r4k_lock);

Why should this be a raw_spnilock_t ?

>  
> +	/* Let variables get attention from cache */
> +	for (i = 0; i < MAX_LOOPS; i++) {
> +		cur_count++;
> +		fini_count += cur_count;
> +		cur_count += fini_count;
>  	}

What does this actually do? You're going to bounce those variables
between this CPU and CPU-0.

> +
> +	atomic_set(&sync_stage, STAGE_START);
> +	csd.func = synchronise_count_master;
> +
> +	/* Master count is always CPU0 */
> +	if (smp_call_function_single_async(0, &csd)) {

This is diguisting.

It also requires a comment on how the on-stack csd is correct (it is,
but it really needs a comment).

> +		pr_err("sync-r4k: Salve: Failed to call master\n");
> +		raw_spin_unlock(&sync_r4k_lock);
> +		return;
> +	}
> +
> +	local_irq_save(flags);
> +
> +	/* Wait until master ready */
> +	while (atomic_read(&sync_stage) != STAGE_MASTER_READY)
> +		cpu_relax();

This really wants to be:

	atomic_cond_read_acquire(&&sync_stage, VAL == STAGE_MASTER_READY);

Because, afaict the smp_wmb() (*) in synchronize_count_master() order
against this here and we need to guarantee we read @sync_stage _before_
@cur_count.

> +
> +	write_c0_count(cur_count);
> +	fini_count = read_c0_count();
> +	smp_wmb();
> +	atomic_inc(&sync_stage); /* inc to STAGE_SLAVE_SYNCED */
>  
>  	local_irq_restore(flags);
> +
> +	raw_spin_unlock(&sync_r4k_lock);
>  }

Furthermore, afaict there isn't actually any concurrency on @sync_stage,
so atomic_t isn't required, Using smp_store_release() to change state
might be far more natural.