[Hotplug_sig] Hotplug CPU Documentation

jschopp at austin.ibm.com (Joel Schopp) · Tue Dec 14 09:48:04 2004

> ========================================================================
> ==================
> LINUX KERNEL HOTPLUG CPU DOCUMENTATION:
> 
> Table of contents:
> 	1.	Introduction: why do we need hotplug CPU?
> 	2.	Which parts of the kernel are affected by Hotplug CPU?
> 	3.	Flow Chart for taking a CPU offline
> 	4.	Flow Chart for bringing a CPU online
> 	5.	Infrastructure required to support CPU hotplug?
> 	(new structs, locks, semaphores, functions, etc...)
> 	6.	Architecture specifics
> 	7.	Remaining Issues
> 	8.	How to use it
> 	9.	References

Like the paper this is based off of this focuses on kernel only.  It 
might be nice to include a description of some of the architecture 
specific management software.  What happens from the time a processor is 
failing to when it is removed?  What happens from the time a user clicks 
to move a cpu and when it is moved?  The kernel is just one piece of the 
puzzle (an important piece to be sure).

> 
> 
> INTRODUCTION: WHY DO WE NEED HOTPLUG CPU?
> 
> As Linux becomes more prominent in the enterprise arena in mission
> critical data center type installations, features that support RAS
> (Reliability, Availability and Serviceability) are required. Since
> modern processor architectures provide advanced error and detection
> technology, offering the possibility to add and remove CPUs becomes
> extremely important for RAS support. However CPU hotplug is not just
> useful to replace defective components it can also be applied in other
> contexts to increase the productivity of a system. For example on a
> single system running multiple Linux partitions, as the workloads change
> it would be extremely useful to be able to move CPUs from one partition
> to the next as required without rebooting or interrupting the workloads.
> This is known as dynamic partitioning.  Other applications include
> Instant Capacity on Demand where extra CPUs are present in a system but
> aren't activated. This is useful for customers that predict growth and
> therefore the need for more computing power but do not have at the time
> of purchase the means to afford.

There are some issues with commas and other grammar I assume will get 
edited.

> 
> WHICH PARTS OF THE KERNEL ARE AFFECTED BY HOTPLUG CPU?
> 
> 	1. SMP Boot Process.
> The original sequence smp_boot_ cpus()/smp_commence() has been replaced
> by the smp_prepare_cpus()/__cpu_up()/smp_cpus_done() sequence of calls.

Might call is SMP Boot Sequence, as Process also has other meanings.

> 
> 	2. Interrupt Handling.
> When a CPU is taken offline the interrupts that it was handling have to
> be retargeted  therefore changes need to be made to the interrupt
> handling process. Refer to paragraph 7 entitled "Architecture Specifics"
> to see how each architecture dealt with this issue.
> 
> 	3. Per-CPU threads handling
> ???????The kernel thread helper functions were not only needed for
> hotplug cpu but more generally to create a cleaner environment for
> interaction w/ userspace. Should I still mention it here???????

Since this came about because of cpu hotplug and lived in the cpu 
hotplug tree for quite some time before being merged to -mm I think it 
is worth mentioning.

Somebody should also mention the "bogolock".  Because hotplug code 
touches so many critical paths a new locking mechanism was necessary. 
This lock is very lightweight read and the heaviest write ever. 
Basically to read you disable interrupts (which has to be done most 
places it is used anyway, making it free for those sections) and to 
write you start kthreads on every cpu at highest priority with 
interrupts disabled.  This stops the machine in every sense of the word 
while the write takes place.  Writes have to be fast to not risk 
dropping disk i/o, ethernet packets, timer pops, etc.

> 
> 
> FLOW CHART FOR TAKING A CPU OFFLINE:
> 
> This section will provide the steps required to bring a CPU offline, for
> historical details on how this process came about please refer to the
> article published at OLS 2004 (see reference xxx in the biography ).
> 
> The general idea is to first make sure that the CPU that is being taken
> off line is a valid candidate,  then put the system in a freeze mode for
> long enough to update the cpu_online_map, then  disable the CPU  and
> kill it using architecture specific procedures and finally notify
> userspace that it's offline.

And of course migrate the interrupts and running work to other cpu(s).

> 
> The bulk of the code taking CPUs offline can be found in kernel/cpu.c
> and kernel/stop_machine.c

The bulk of the architecture independent code, yes.

> 
> Here's a step-by-step explanation of what is done by the code:
> 
> 		1. Take cpu_control semaphore to make sure that no other
> hotplug event will be handled at the same time,
> 		2. Make sure there's at least one other CPU online
> (taking offline the last running CPU in a system is NOT a good idea!),
> 		3. Make sure the CPU is actually online,
> 		4. Take the CPU you're taking offline out of the
> available CPU mask of this process. This is to avoid that the current
> task performing the offlining will not migrate back onto the offlined
> CPU,
> 		5. To avoid to have to hold a lock to access
> cpu_online_map which would negatively impact the scheduler we call
> stop_machine_run(). This routine schedules a high priority  thread per
> CPU. When all the threads per CPU are ready, interrupts on all CPUs are
> simultaneously disabled. This is done so that to access cpu_online_map
> reliably all you need to do is turn preemption off. 
> 		6. Once the system is in that frozen state we execute
> the function passed as argument in stop_machine_run, in this case it's
> take_cpu_down. This function first takes the CPU out of cpu_online_map,
> this is currently done in generic code.
> 		7. It then disables the CPU by calling an arch specific
> __cpu_disable function to guarantee that no more interrupts will be
> received by this CPU (in case this call fails the cpu_online_map is
> restored and restart_machine is called immediately). As part of
> __cpu_disable the architecture specific code does its own specific
> checks to see if this CPU can be taken offline (each architecture might
> have its own set of constraints limiting what CPUs can be taken offline,
> for ex. offlining the boot CPU might not be allowed), migrates the
> interrupts to other CPUs and does appropriate clean-up as required (for
> ex handling local timer issues).
> 		8. Then the CPU is put in idle state by calling
> sched_idle_next() which gives the idle task a high priority to ensure
> that nothing else will run on the offlined CPU. This allows the
> migration of user tasks away from the offlined CPU to be done after the
> call to restart_machine (cleaver trick to minimize the amount of time
> the whole machine is frozen). 
> 		9. When CPU is idle, call the arch specific function
> __cpu_die to kill the offlined CPU. 
> 		10. Call the CPU_DEAD notifier which the scheduler uses
> to migrate tasks off the dead CPU and restores the idle task, the
> workqueues remove the unneeded threads, ksoftirq is stopped and pending
> softirqs are handled, per-CPU caches are released, timers are migrated,
> etc...
> 		11. Notify the user by calling /sbin/hotplug cpu
> ACTION=offline
> 		12. Release the cpu_control semaphore
>  
> 
> FLOW CHART FOR BRINGING A CPU ONLINE:
> 
> This section will provide the steps required to bring a CPU online.
> The general idea is to first make sure that the CPU is present and not
> already online, create new kernel threads to support that CPU, put the
> CPU in the cpu_online_map, then start the kernel threads and finally
> notify userspace that the CPU is online.
> 
> The bulk of the code bringing CPUs online can be found in kernel/cpu.c
> Here's a step-by-step explanation of what is done by the code:
> 
> 		1. Take cpu_control semaphore to make sure that no other
> hotplug event will be handled at the same time,
> 		2. Make sure that the CPU is present and not already
> online. 
> 		3. Call notifier CPU_UP_PREPARE to create new kernel
> threads such as migration thread, workqueue thread, etc... for that CPU.
> 		4. Bring the CPU up by calling an arch specific __cpu_up
> function which puts the CPU in the cpu_online_map and enables the local
> interrupts. If that call fails call the notifier with CPU_UP_CANCELED
> which will reverse the above steps.
> 		5. Finally call the notifier with CPU_ONLINE to kick off
> the worker threads and start the timers.
> 		6. Notify userspace by calling /sbin/hotplug cpu
> ACTION=online
> 		7. Release the cpu_control semaphore