On Mon, Aug 17, 2020 at 11:46:40AM +0800, Jiaxun Yang wrote: > Here we reworked the whole procdure. Now the synchronise event on CPU0 > is triggered by smp call function, and we won't touch the count on CPU0 > at all. Are you telling me, that in 2020 you're building chips that need horrible crap like this ?!? > +#define MAX_LOOPS 1000 > + > +void synchronise_count_master(void *unused) > { > unsigned long flags; > + long delta; > + int i; > > + if (atomic_read(&sync_stage) != STAGE_START) > + BUG(); BUG_ON(atomic_read(&sync_state) != STAGE_START); > > local_irq_save(flags); That's silly, replace with: lockdep_assert_hardirqs_disabled(). > > + cur_count = read_c0_count(); > + smp_wmb(); > + atomic_inc(&sync_stage); /* inc to STAGE_MASTER_READY */ memory barriers require a comment that describes the ordering. This includes at least 2 variables and at least 2 code paths (*) -- afaict your code does NOT have a matching barrier, see below. > > + for (i = 0; i < MAX_LOOPS; i++) { > + cur_count = read_c0_count(); > smp_wmb(); > - atomic_inc(&count_count_stop); > + if (atomic_read(&sync_stage) == STAGE_SLAVE_SYNCED) > + break; > } > + > + delta = read_c0_count() - fini_count; > > local_irq_restore(flags); > > + if (i == MAX_LOOPS) > + pr_err("sync-r4k: Master: synchronise timeout\n"); > + else > + pr_info("sync-r4k: Master: synchronise succeed, maximum delta: %ld\n", delta); > + > + return; > } > > void synchronise_count_slave(int cpu) > { > int i; > unsigned long flags; > + call_single_data_t csd; > > + raw_spin_lock(&sync_r4k_lock); Why should this be a raw_spnilock_t ? > > + /* Let variables get attention from cache */ > + for (i = 0; i < MAX_LOOPS; i++) { > + cur_count++; > + fini_count += cur_count; > + cur_count += fini_count; > } What does this actually do? You're going to bounce those variables between this CPU and CPU-0. > + > + atomic_set(&sync_stage, STAGE_START); > + csd.func = synchronise_count_master; > + > + /* Master count is always CPU0 */ > + if (smp_call_function_single_async(0, &csd)) { This is diguisting. It also requires a comment on how the on-stack csd is correct (it is, but it really needs a comment). > + pr_err("sync-r4k: Salve: Failed to call master\n"); > + raw_spin_unlock(&sync_r4k_lock); > + return; > + } > + > + local_irq_save(flags); > + > + /* Wait until master ready */ > + while (atomic_read(&sync_stage) != STAGE_MASTER_READY) > + cpu_relax(); This really wants to be: atomic_cond_read_acquire(&&sync_stage, VAL == STAGE_MASTER_READY); Because, afaict the smp_wmb() (*) in synchronize_count_master() order against this here and we need to guarantee we read @sync_stage _before_ @cur_count. > + > + write_c0_count(cur_count); > + fini_count = read_c0_count(); > + smp_wmb(); > + atomic_inc(&sync_stage); /* inc to STAGE_SLAVE_SYNCED */ > > local_irq_restore(flags); > + > + raw_spin_unlock(&sync_r4k_lock); > } Furthermore, afaict there isn't actually any concurrency on @sync_stage, so atomic_t isn't required, Using smp_store_release() to change state might be far more natural.