On Mon, May 08, 2023 at 09:43:39PM +0200, Thomas Gleixner wrote: > @@ -233,14 +237,31 @@ static void notrace start_secondary(void > load_cr3(swapper_pg_dir); > __flush_tlb_all(); > #endif > + /* > + * Sync point with wait_cpu_initialized(). Before proceeding through > + * cpu_init(), the AP will call wait_for_master_cpu() which sets its > + * own bit in cpu_initialized_mask and then waits for the BSP to set > + * its bit in cpu_callout_mask to release it. > + */ > cpu_init_secondary(); > rcu_cpu_starting(raw_smp_processor_id()); > x86_cpuinit.early_percpu_clock_init(); > + > + /* > + * Sync point with wait_cpu_callin(). The AP doesn't wait here > + * but just sets the bit to let the controlling CPU (BSP) know that > + * it's got this far. > + */ > smp_callin(); > > - /* otherwise gcc will move up smp_processor_id before the cpu_init */ > + /* Otherwise gcc will move up smp_processor_id() before cpu_init() */ > barrier(); Not to the detriment of this patch, but this barrier() and it's comment seem weird vs smp_callin(). That function ends with an atomic bitop (it has to, at the very least it must not be weaker than store-release) but also has an explicit wmb() to order setup vs CPU_STARTING. (arguably that should be a full fence *AND* get a comment) There is no way the smp_processor_id() referred to in this comment can land before cpu_init() even without the barrier(). > - /* Check TSC synchronization with the control CPU: */ > + > + /* > + * Check TSC synchronization with the control CPU, which will do > + * its part of this from wait_cpu_online(), making it an implicit > + * synchronization point. > + */ > check_tsc_sync_target(); > > /*