Lessons Learned from Hotplug CPU Test Case Development ====================================================== Looking back, the Hotplug CPU test case effort worked pretty well. We were able to involve a lot of different people with different levels of skill in the process, such that the actual coding of the test cases was fairly straightforward. I think we benefit from following this same model as we start looking at the memory hotplug testing, and based on the experience with CPU I have some thoughts on how we could improve the process. The following is divided into "design-time" and "implementation-time" thoughts. TEST CASE DESIGN ================ 1. Make the test descriptions specific - We found that some of the CPU test cases were easier to code than others, because sometimes items were described ambiguously. For example, "Verify the flux capaciter works" gives you four questions: At what point do we do the verification? What's a flux capaciter? How do you know that it works? What do we do if it doesn't work? A better description would say, "After switching it on, check that LED42 is green on the Flux Capacitor Status Panel, and mark test FAILED if it isn't." Example commandlines are extremely helpful. For instance, instead of saying, "Use 'foo' to check that bar works," even more useful would be, "Run something like 'foo -a -b 3 -c 10', and verify that 'bar' is listed in the output. 2. State even the obvious. With hotplug01, one of the things that really slowed us down was that we didn't know what an "Affinity Mask" was, nor how to generate the hex masks, verify it, etc. It was only after seeing someone else's test case that we finally understood. In the test definitions, be sure to define new concepts or give pointers to examples. If unfamiliar terminology gets in the way, it can really slow down implementation. Also, when writing the test case description, write down a few sentences describing what the test case should do, and what issue it addresses. In some cases we had to do some research to back out what the test cases were for, and it'd save time to have this up front. The person writing the test case probably already has this in mind at the time they write it, so it should be easy to jot down early on. 3. Avoid reliance on interactive tools. Some of the biggest challenges we had with implementing the test cases was with phrases that specifying monitoring with interactive tools. For example, "Start a and b, then watch on top to make sure c happens." top is easy for use in interactive testing, but for test cases it can be complicated to rely on tools like these. We were able to usually find some other approach, such as extracting from a file in /proc. 4. Pseudocode. We found that after reviewing the test cases, implementing them in crude pseudocode was worthwhile, because it helped identify the basic logic in a form that could be easily translated into bash. This intermediate format was also simple enough that others could review and comment on it without having to dig through extraneous structural and error checking code. The key objective for writing pseudocode is to work out solutions for the technical details, like how to operate various tools or workloads, or to define algorithms for parsing output from other tools, order of operations, loop structures, and so forth. Things like error handling, output formatting, etc. can be left for the implementor to do. TEST CASE IMPLEMENTATION ======================== 5. Keep It Short and Simple (KISS) - We made each test case focused on a specific kind of test. We preferred breaking things like file parsing, workloads, etc. into separate scripts than to try to pile everything into a single script. Do one thing and do it well. We also designed the test cases so they can be run one time through very quickly, or optionally be run in loops; this way a developer could use the suite as a quick sanity check, while a tester could configure it to run continously under a workload. Think of the individual test cases as building blocks to assemble more complex tests from later. By keeping the test cases simple, and doing more sophisticated logic at a higher level, it gives everyone more flexibility and power for future testing. 6. Parameterize with environment variables. In general, each test case should run with no parameters or with an absolute minimum, so it's easy to use. However, the test cases are more useful if you can override certain internal settings such as the number of loops to run through. We found the most convenient way to do this in bash was with environment variables. Here is the syntax we used: HOTPLUG06_LOOPS=${HOTPLUG06_LOOPS:-${LOOPS}} loop_six=${HOTPLUG06_LOOPS:-1} This allows the user to override the looping for this specific test case by setting the environment variable $HOTPLUG06_LOOPS, or allow them to set looping for _all_ test cases via the variable $LOOPS. By default, if neither parameter is specified, it defaults to 1 loop. When describing test cases, also think about other ways to parameterize it, such as length of time to sleep between various operations, temporary file names, names and paths for input or output files, commands that may be platform-specific, etc. 7. Cleanup after yourself. The CPU hotplug test cases turned CPUs on and off. Obviously, it could be annoying if the test suite left some of your processors off that weren't off before when it finished! We adopted the practice of working to ensure the test cases each left the system more or less as it found it. Thus, a test case that attempts to turn on and off all CPUs would keep track of which CPUs were on or off at the start, then do its testing, and then restore the CPU's to the original on/off states. We also liked Ashok Raj's approach of trapping user interrupt signals to perform cleanups: trap "do_intr" 1 2 15 do_intr() { echo "HotPlug01 FAIL: User interrupt" do_clean exit 1 } This is in test case 1, with plans to add it to all the other test cases.