[Hotplug_sig] more cpu regression test cases

markw at osdl.org (Mark Wong) · Mon Mar 14 15:14:35 2005

I've started to outline more regression test cases based on Nathan's
suggestion.  (attached)  Comments?

We'll probably discuss it a bit on tomorrow Hotplug SIG conference
call.

Thanks,
Mark
-------------- next part --------------
Test Case 1:
What happens to disk controller interrupts when you offline a CPU on
a multiprossor system?

1. Note the current smp_affinity mask for the disk controller to stress.

   Set the IRQ smp_affinity mask for the disk controller to all CPU's.

   Echo the appropriate hex mask into /proc/irq/IRQ#/smp_affinity

   Verify the smp_affinity mask.

2. Start watching the interrupt counts in /proc/interrupts.

   Is it worth verifying tools such as sar at the same time?

3. Start writing to a disk.

   while true; do echo 1 > dud; sleep 1; done

   Suggestions for what to do in order to be able to verify all writes
   are completed and correct?

4. Offline a CPU, pick on cpu1.

   echo 0 > /sys/devices/system/cpu/cpu1/online

   cpu0 is not hotswappable on some architectures and will not have an online
   attribute.

   Can we pinpoint when a CPU goes offline?

   It's my understanding that timeslice overrun prevents
   'time echo 0 > /sys/devices/system/cpu/cpu1/online' from being an
   accurate measure of how long it takes to offline a CPU.

   A turn of 0 (zero) signified the successful complettion of offlining the
   CPU from the kernel's point of view.

   Verify the smp_affinity mask of the affected disk controller.

5. Analyze data collected from /proc/interrupts?

   Relevent messages in /var/log/messages regarding the procedure will occur
   depending on the architecture tested on.

Test Case 2:
What happens to a process when you offline a CPU on a multiprossor system?

1. Start a shell script that spins on a CPU.

2. Note the current process affinity mask of the spinning process using taskset.
   I believe there is at least one other tool available.  How important is it to
   note all of them.

   Set the processor affinity mask to cpu1, using taskset.

   Verify the processor affinity mask, using taskset.

2. Start recording the cpu utilization using sar.

   Is it worth verifying other tools such at the same time?

3. Offline a CPU, pick on cpu1.

   echo 0 > /sys/devices/system/cpu/cpu1/online

   cpu0 is not hotswappable on some architectures and will not have an online
   attribute.

   Verify the processor affinity mask, using taskset.

5. Analyze processor utilization data collected.

   Relevent messages in /var/log/messages regarding the procedure will occur
   depending on the architecture tested on.

Test Case 3:
Check that tasks are scheduled on a newly on-lined CPU.

1. Offline a cpu.

2. Start a script to spins a CPU, per total number of processors in the
   systems, including the cpu just offlined.

3. Monitor the processor utilization on each CPU.

4. Online the cpu from step 1.

5. Analyze the processor utilization to determine if one of the spinning
   tasks migrated to the new cpu.

Test Case 4:
Offline the last running CPU.

1. Starting from the last cpu, as opposed to cpu0, offline the CPU, except
   for the cpu0.

2. Verify if cpu0 can be offlined by checking the existance of 
   /sys/devices/system/cpu/cpu0/online.

3. Offline cpu0, if the attribute exists, and check for EBUSY for correct
   behavior.

Test Case 5:
Stress Test

1. Start monituring memory usage.
   vmstat? sar?

2. Start a cpu and memory intensive test for a duration of 4(?) hours.
   (tpc-c, reaim, suggestions?)

3. Offline and online processors at regular intervals throughout the
   duration of the test.

4. At the end of the test, analyze the system statistics to determine
   memory leaks.