Hi all, I haven't been able to spend as much time on this document as I was hoping to but I figured that if I release a first draft to this small group I'll get the proverbial ball rolling. As you will see, this is a very rough and incomplete draft (I've only made it 'til paragraph 4). It is mainly based on the paper published at OLS 2004 entitled "Linux Kernel Hotplug CPU Support" and on the code currently in mainline. Please give me feedback on the following things: 1. Accuracy of the content 2. Does the table of content cover everything the document should cover (if you think that some of the stuff I'm thinking of covering is useless or not appropriate for this document please also let me know). Also look for "?????" to indicate that I have a question for you. 3. Am I too verbose? (I've also been accused of having a Proustian style of writing, let me know if I did it again) 4. Any other comment/suggestion/correction etc... are welcome and appreciated. Thanks - Martine ======================================================================== ================== LINUX KERNEL HOTPLUG CPU DOCUMENTATION: Table of contents: 1. Introduction: why do we need hotplug CPU? 2. Which parts of the kernel are affected by Hotplug CPU? 3. Flow Chart for taking a CPU offline 4. Flow Chart for bringing a CPU online 5. Infrastructure required to support CPU hotplug? (new structs, locks, semaphores, functions, etc...) 6. Architecture specifics 7. Remaining Issues 8. How to use it 9. References INTRODUCTION: WHY DO WE NEED HOTPLUG CPU? As Linux becomes more prominent in the enterprise arena in mission critical data center type installations, features that support RAS (Reliability, Availability and Serviceability) are required. Since modern processor architectures provide advanced error and detection technology, offering the possibility to add and remove CPUs becomes extremely important for RAS support. However CPU hotplug is not just useful to replace defective components it can also be applied in other contexts to increase the productivity of a system. For example on a single system running multiple Linux partitions, as the workloads change it would be extremely useful to be able to move CPUs from one partition to the next as required without rebooting or interrupting the workloads. This is known as dynamic partitioning. Other applications include Instant Capacity on Demand where extra CPUs are present in a system but aren't activated. This is useful for customers that predict growth and therefore the need for more computing power but do not have at the time of purchase the means to afford. WHICH PARTS OF THE KERNEL ARE AFFECTED BY HOTPLUG CPU? 1. SMP Boot Process. The original sequence smp_boot_ cpus()/smp_commence() has been replaced by the smp_prepare_cpus()/__cpu_up()/smp_cpus_done() sequence of calls. 2. Interrupt Handling. When a CPU is taken offline the interrupts that it was handling have to be retargeted therefore changes need to be made to the interrupt handling process. Refer to paragraph 7 entitled "Architecture Specifics" to see how each architecture dealt with this issue. 3. Per-CPU threads handling ???????The kernel thread helper functions were not only needed for hotplug cpu but more generally to create a cleaner environment for interaction w/ userspace. Should I still mention it here??????? FLOW CHART FOR TAKING A CPU OFFLINE: This section will provide the steps required to bring a CPU offline, for historical details on how this process came about please refer to the article published at OLS 2004 (see reference xxx in the biography ). The general idea is to first make sure that the CPU that is being taken off line is a valid candidate, then put the system in a freeze mode for long enough to update the cpu_online_map, then disable the CPU and kill it using architecture specific procedures and finally notify userspace that it's offline. The bulk of the code taking CPUs offline can be found in kernel/cpu.c and kernel/stop_machine.c Here's a step-by-step explanation of what is done by the code: 1. Take cpu_control semaphore to make sure that no other hotplug event will be handled at the same time, 2. Make sure there's at least one other CPU online (taking offline the last running CPU in a system is NOT a good idea!), 3. Make sure the CPU is actually online, 4. Take the CPU you're taking offline out of the available CPU mask of this process. This is to avoid that the current task performing the offlining will not migrate back onto the offlined CPU, 5. To avoid to have to hold a lock to access cpu_online_map which would negatively impact the scheduler we call stop_machine_run(). This routine schedules a high priority thread per CPU. When all the threads per CPU are ready, interrupts on all CPUs are simultaneously disabled. This is done so that to access cpu_online_map reliably all you need to do is turn preemption off. 6. Once the system is in that frozen state we execute the function passed as argument in stop_machine_run, in this case it's take_cpu_down. This function first takes the CPU out of cpu_online_map, this is currently done in generic code. 7. It then disables the CPU by calling an arch specific __cpu_disable function to guarantee that no more interrupts will be received by this CPU (in case this call fails the cpu_online_map is restored and restart_machine is called immediately). As part of __cpu_disable the architecture specific code does its own specific checks to see if this CPU can be taken offline (each architecture might have its own set of constraints limiting what CPUs can be taken offline, for ex. offlining the boot CPU might not be allowed), migrates the interrupts to other CPUs and does appropriate clean-up as required (for ex handling local timer issues). 8. Then the CPU is put in idle state by calling sched_idle_next() which gives the idle task a high priority to ensure that nothing else will run on the offlined CPU. This allows the migration of user tasks away from the offlined CPU to be done after the call to restart_machine (cleaver trick to minimize the amount of time the whole machine is frozen). 9. When CPU is idle, call the arch specific function __cpu_die to kill the offlined CPU. 10. Call the CPU_DEAD notifier which the scheduler uses to migrate tasks off the dead CPU and restores the idle task, the workqueues remove the unneeded threads, ksoftirq is stopped and pending softirqs are handled, per-CPU caches are released, timers are migrated, etc... 11. Notify the user by calling /sbin/hotplug cpu ACTION=offline 12. Release the cpu_control semaphore FLOW CHART FOR BRINGING A CPU ONLINE: This section will provide the steps required to bring a CPU online. The general idea is to first make sure that the CPU is present and not already online, create new kernel threads to support that CPU, put the CPU in the cpu_online_map, then start the kernel threads and finally notify userspace that the CPU is online. The bulk of the code bringing CPUs online can be found in kernel/cpu.c Here's a step-by-step explanation of what is done by the code: 1. Take cpu_control semaphore to make sure that no other hotplug event will be handled at the same time, 2. Make sure that the CPU is present and not already online. 3. Call notifier CPU_UP_PREPARE to create new kernel threads such as migration thread, workqueue thread, etc... for that CPU. 4. Bring the CPU up by calling an arch specific __cpu_up function which puts the CPU in the cpu_online_map and enables the local interrupts. If that call fails call the notifier with CPU_UP_CANCELED which will reverse the above steps. 5. Finally call the notifier with CPU_ONLINE to kick off the worker threads and start the timers. 6. Notify userspace by calling /sbin/hotplug cpu ACTION=online 7. Release the cpu_control semaphore