I've revised the pseudocode to make it match with what was coded and to make it a bit more readible. I haven't done case 5 since that's not been finished. These will be included in the LHCS Regression Test Suite package. Attached below... Bryce Testcase 01 ----------- This test attempts to verify that when a CPU is offlined, that a process writing to disk doesn't cause an issue. We create a process that writes to disk, force it to run only on a specified CPU by setting its CPU affinity to just that CPU, then offline that CPU, and verify that the process moves to another processor properly. Notes ===== There are two kinds of masks: One to specify which CPU's are allowed to be used for the given process, and one for the smp affinity. This may be hard to verify but we can indirectly check on this by looking at /proc/stat or measuring the relative performance of some parallelized benchmark before and after onlining the CPU. Algorithm ========= Given a CPU to test that exists Take a snapshot of what CPUs are on and off initially Make sure the cpu is online Start up a process that writes to disk Loop until done: Take a snapshot of /proc/interrupts Foreach CPU in the system online the CPU migrate the IRQs to it sleep a little while Foreach CPU in the system migrate IRQs onto the CPU offline the cpu sleep a little while Take another snapshot of /proc/interrupts Print a report showing the change in IRQs When exiting: Kill the write loop process Restore all CPUs to their initial state Testcase 02 ----------- This test checks that a process migrates when the CPU it is running on is offlined. Algorithm ========= Given a CPU to test that exists Make sure the cpu is online Start a process that just uses processor cycles Loop until done: Move the process to the CPU we will be offlining Offline the CPU Determine which CPU the process migrated to Verify that it is still running Verify that it is not running on the original CPU Turn the CPU back online When exiting: Kill the spin loop process Testcase 03 ----------- This test verifies that when you online a new CPU, that the scheduler takes advantage of it by shifting some of its workloads onto it. We do this by offlining a CPU, creating a bunch of processor intensive processes, and then onlining the CPU, and checking to make sure at least one of the processes moved to that CPU. Algorithm ========= Given a CPU to test that exists Take a snapshot of what CPUs are on and off initially Loop until done: Online all of the CPUs and note their state Offline the specified CPU Start up a number of processes equal to twice the number of CPUs we have, so we can be pretty sure that we've got enough processes that at least one will migrate to the new CPU. Now online the specified CPU Wait a few seconds, to allow the process scheduler to move processes around a bit. Verify that at least one process has migrated to the new CPU by looking at the output from 'ps -o psr -o com' and searching for our CPU running the process. When exiting: Kill all of the load processes Restore all CPUs to their initial state Testcase 04 ----------- This test verifies that we can't offline ALL of the CPUs in the system. We do this by onlining all the cpus, then offlining all the cpus and verifying that an error is returned for the last one. Algorithm ========= Loop until done: Take a snapshot of what CPUs are on and off initially Online all the CPUs Offline al the CPUs Restore system to initial state Testcase 06 ----------- It's been found that sometimes onlining and offlining CPUs confuse some of the various system tools. In particular, we found it caused top to crash, and found that sar wouldn't register newly available cpus that weren't there when it started. This test case seeks to exercise these known error cases and verify that they behave correctly now. Algorithm - Top =============== Given a CPU to test that exists Make sure the specified cpu is online Loop until done: Start up top and give it a little time to run Offline the specified CPU Wait a little time for top to notice the CPU is gone Now check that top hasn't crashed by verifying its PID is still being reported by ps. When exiting: Kill the top process Restore all CPUs to their initial state Algorithm - Sar =============== Given a CPU to test that exists Make sure the specified cpu is offline Loop until done: Start up sar writing to a temp log and give it a little time to run Verify that SAR has correctly listed the missing CPU as 'nan' in its tmp log Take a timestamp and count how many CPUs sar is reporting to be offline Online the specified cpu Take another timestamp and another count of offlined CPUs. Verify that the number of CPUs offline has changed When exiting: Kill the sar process