[Hotplug_sig] Hotplug CPU Documentation

martine.silbermann at hp.com (Silbermann, Martine) · Tue Dec 14 09:13:15 2004

Hi all,

I haven't been able to spend as much time on this document as I was
hoping to but I figured that if I release a first draft to this small
group I'll get the proverbial ball rolling. As you will see, this is a
very rough and incomplete draft (I've only made it 'til paragraph 4). It
is mainly based on the paper published at OLS 2004 entitled "Linux
Kernel Hotplug CPU Support" and on the code currently in mainline.

Please give me feedback on the following things:
	1. Accuracy of the content
	2. Does the table of content cover everything the document
should cover (if you think that some of the stuff I'm thinking of
covering is useless or not appropriate for this document please also let
me know). Also look for "?????" to indicate that I have a question for
you.
	3. Am I too verbose? (I've also been accused of having a
Proustian style of writing, let me know if I did it again)
	4. Any other comment/suggestion/correction etc... are welcome
and appreciated.

Thanks -
Martine

========================================================================
==================
LINUX KERNEL HOTPLUG CPU DOCUMENTATION:

Table of contents:
	1.	Introduction: why do we need hotplug CPU?
	2.	Which parts of the kernel are affected by Hotplug CPU?
	3.	Flow Chart for taking a CPU offline
	4.	Flow Chart for bringing a CPU online
	5.	Infrastructure required to support CPU hotplug?
	(new structs, locks, semaphores, functions, etc...)
	6.	Architecture specifics
	7.	Remaining Issues
	8.	How to use it
	9.	References

INTRODUCTION: WHY DO WE NEED HOTPLUG CPU?

As Linux becomes more prominent in the enterprise arena in mission
critical data center type installations, features that support RAS
(Reliability, Availability and Serviceability) are required. Since
modern processor architectures provide advanced error and detection
technology, offering the possibility to add and remove CPUs becomes
extremely important for RAS support. However CPU hotplug is not just
useful to replace defective components it can also be applied in other
contexts to increase the productivity of a system. For example on a
single system running multiple Linux partitions, as the workloads change
it would be extremely useful to be able to move CPUs from one partition
to the next as required without rebooting or interrupting the workloads.
This is known as dynamic partitioning.  Other applications include
Instant Capacity on Demand where extra CPUs are present in a system but
aren't activated. This is useful for customers that predict growth and
therefore the need for more computing power but do not have at the time
of purchase the means to afford.

WHICH PARTS OF THE KERNEL ARE AFFECTED BY HOTPLUG CPU?

	1. SMP Boot Process.
The original sequence smp_boot_ cpus()/smp_commence() has been replaced
by the smp_prepare_cpus()/__cpu_up()/smp_cpus_done() sequence of calls.

	2. Interrupt Handling.
When a CPU is taken offline the interrupts that it was handling have to
be retargeted  therefore changes need to be made to the interrupt
handling process. Refer to paragraph 7 entitled "Architecture Specifics"
to see how each architecture dealt with this issue.

	3. Per-CPU threads handling
???????The kernel thread helper functions were not only needed for
hotplug cpu but more generally to create a cleaner environment for
interaction w/ userspace. Should I still mention it here???????

FLOW CHART FOR TAKING A CPU OFFLINE:

This section will provide the steps required to bring a CPU offline, for
historical details on how this process came about please refer to the
article published at OLS 2004 (see reference xxx in the biography ).

The general idea is to first make sure that the CPU that is being taken
off line is a valid candidate,  then put the system in a freeze mode for
long enough to update the cpu_online_map, then  disable the CPU  and
kill it using architecture specific procedures and finally notify
userspace that it's offline.

The bulk of the code taking CPUs offline can be found in kernel/cpu.c
and kernel/stop_machine.c

Here's a step-by-step explanation of what is done by the code:

		1. Take cpu_control semaphore to make sure that no other
hotplug event will be handled at the same time,
		2. Make sure there's at least one other CPU online
(taking offline the last running CPU in a system is NOT a good idea!),
		3. Make sure the CPU is actually online,
		4. Take the CPU you're taking offline out of the
available CPU mask of this process. This is to avoid that the current
task performing the offlining will not migrate back onto the offlined
CPU,
		5. To avoid to have to hold a lock to access
cpu_online_map which would negatively impact the scheduler we call
stop_machine_run(). This routine schedules a high priority  thread per
CPU. When all the threads per CPU are ready, interrupts on all CPUs are
simultaneously disabled. This is done so that to access cpu_online_map
reliably all you need to do is turn preemption off. 
		6. Once the system is in that frozen state we execute
the function passed as argument in stop_machine_run, in this case it's
take_cpu_down. This function first takes the CPU out of cpu_online_map,
this is currently done in generic code.
		7. It then disables the CPU by calling an arch specific
__cpu_disable function to guarantee that no more interrupts will be
received by this CPU (in case this call fails the cpu_online_map is
restored and restart_machine is called immediately). As part of
__cpu_disable the architecture specific code does its own specific
checks to see if this CPU can be taken offline (each architecture might
have its own set of constraints limiting what CPUs can be taken offline,
for ex. offlining the boot CPU might not be allowed), migrates the
interrupts to other CPUs and does appropriate clean-up as required (for
ex handling local timer issues).
		8. Then the CPU is put in idle state by calling
sched_idle_next() which gives the idle task a high priority to ensure
that nothing else will run on the offlined CPU. This allows the
migration of user tasks away from the offlined CPU to be done after the
call to restart_machine (cleaver trick to minimize the amount of time
the whole machine is frozen). 
		9. When CPU is idle, call the arch specific function
__cpu_die to kill the offlined CPU. 
		10. Call the CPU_DEAD notifier which the scheduler uses
to migrate tasks off the dead CPU and restores the idle task, the
workqueues remove the unneeded threads, ksoftirq is stopped and pending
softirqs are handled, per-CPU caches are released, timers are migrated,
etc...
		11. Notify the user by calling /sbin/hotplug cpu
ACTION=offline
		12. Release the cpu_control semaphore

FLOW CHART FOR BRINGING A CPU ONLINE:

This section will provide the steps required to bring a CPU online.
The general idea is to first make sure that the CPU is present and not
already online, create new kernel threads to support that CPU, put the
CPU in the cpu_online_map, then start the kernel threads and finally
notify userspace that the CPU is online.

The bulk of the code bringing CPUs online can be found in kernel/cpu.c
Here's a step-by-step explanation of what is done by the code:

		1. Take cpu_control semaphore to make sure that no other
hotplug event will be handled at the same time,
		2. Make sure that the CPU is present and not already
online. 
		3. Call notifier CPU_UP_PREPARE to create new kernel
threads such as migration thread, workqueue thread, etc... for that CPU.
		4. Bring the CPU up by calling an arch specific __cpu_up
function which puts the CPU in the cpu_online_map and enables the local
interrupts. If that call fails call the notifier with CPU_UP_CANCELED
which will reverse the above steps.
		5. Finally call the notifier with CPU_ONLINE to kick off
the worker threads and start the timers.
		6. Notify userspace by calling /sbin/hotplug cpu
ACTION=online
		7. Release the cpu_control semaphore