Re: [PATCH v16 3/8] cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Fri, 24 Mar 2023 14:57:13 +0100

On Fri, Mar 24 2023 at 09:31, David Woodhouse wrote:
> On Fri, 2023-03-24 at 02:16 +0100, Thomas Gleixner wrote:
>> So the proper thing is to split CPUHP_BRINGUP_CPU, which is the bridging
>> state between AP and BP today, into a set of synchronization points
>> between BP and AP.
>
> I feel we're talking past each other a little bit. Because I'd have
> said that's *exactly* what this patch set is doing.
>
> The existing x86 CPUHP_BRINGUP_CPU step in native_cpu_up() has a bunch
> of back-and-forth between BP and AP, which you've enumerated below and
> which I cleaned up and commented in the 'Split up native_cpu_up into
> separate phases and document them' patch.
>
> This patch set literally introduces the new PARALLEL_DYN states to be
> "a set of synchronization points between BP and AP", and then makes the
> x86 code utilise that for the *first* of its back-and-forth exchanges
> between BP and AP.

It provides a dynamic space which is absolutely unspecified and just
opens the door for tinkering.

This first step does not even contain a synchronization point. All it
does is to kick the AP into gear. Not more, not less. Naming this
obscurely as PARALLEL_DYN is a tasteless bandaid.

If you go and look at all __cpu_up() implementations, then you'll notice
that these are very similar. All of them do

   1) Kick AP
   2) Synchronize at least once between BP and AP

There is nothing x86 specific about this. So instead of hiding this
behind a misnomed dynamic space, the obviously correct solution is to
make this an explicit mechanism in the core code and convert _all_
architectures over to this scheme. That's in the first place completely
independent of parallel bringup.

It has a value on it's own as it consolidates the zoo of synchronization
mechanisms (cpumasks, completions, random variables) into one shared
mechanism in the core code.

That's very much different from what your patch is doing. And there is a
very good reason aside of consolidation to do so:

This prepares to handle the parallelism requirements in the core code
instead of letting each architecture come up with its own set of
magic. Which in turn is a prerequisite for allowing the STARTING
callbacks or later the threaded AP states to execute in parallel.

Why? Simply because of this:

  BP			AP              state
  kick()                                BRINGUP_CPU
                        startup()                      
  sync()                sync() 
                        starting()      advances to AP_ONLINE
  sync()                sync()
  TSC_sync()            TSC_sync()
  wait_for_online()     set_online()
                        cpu_startup_entry() AP_ONLINE_IDLE
  wait_for_completion() complete()

This works correctly today because bringup_cpu() does not modify state
and excpects the state to be advanced by the AP once the completion is
done.

So you _cannot_ just throw some magic dynamic states before BRINGUP_CPU
and then expect that the state machine is consistent when the AP is
allowed to run the starting callbacks in parallel.

The sync point after the starting callbacks simply cannot be in that
dynamic state space. It has to be _after_ the AP starting states.

That needs a fundamental change in the way how the state management
at this point works and this needs to be done upfront. Aside of the
general serialization aspects this needs some deep thought whether the
BP control can stay single threaded or if it's required to spawn a
control thread per AP.

The kick CPU into life state is completely independent of the above and
can be moved just before BRINGUP_CPU without violating anything. But
that's where the easy to solve part stops hard.

You might find my wording offensive, but I perceive your "let's use a
few dynamic states and see what sticks" approach offensive too.

The analysis of all this is not rocket science and could have been done
long ago by yourself.

Thanks,

        tglx