These are the minutes of the Hotplug SIG meeting today, July 5. Attendees: Mary Meredith, Mark Wong, Bryce Harrington, OSDL George Mann, individual contributor Natalie Protasevich, Unisys Bruce Vessey, Unisys (Martine is on vacation) Next regular meeting canceled due to OLS, next call is August 2. Action Items from the meeting: 1-Bryce, Mary: Reconcile the SIG homepage, Mark's original test web page, and Bryce's new web page to be sure that there was nothing dropped from Mark, and that there are not duplicates (as in the case of the test case descriptions). 2-Mary, Mark: Determine what should be monitored for tests involving DBT2 (OLTP like database workload) to determine when/if CPU on-line or off-line does not redistribute the load properly. 3-Bryce, adjust the psuedo code according to what we discussed on the call (in the minutes below). 4-Bryce to start posting results of member patch boot tests for the three architectures at OSDL on his test web site. 5-Bryce to ask the community to see if consistent reporting across architectures is important for statistics tools when they account for resource additions/removals (e.g. SAR:cpu remove causes cpu reported with zero interrupts in one case, where cpu is not reported at all in another. Is this OK?). 6-Mary, Bryce, Mark Determine what statistics tools are effected by hotplug events. We know TOP and SAR. What else? 5-Use Case contributors need to review the draft Martine has forwarded via private email. Announcements: *Dedicated OpenHPI's conference call -Will be on July 11th, at 4:00pm -5:00pm EST, 1-2pm Pacific -Informal discussion along these lines: ---How much control will there be for hotplug events. ---How will the control be established (the rules). ---This is what open HPI allows you to do. -Everyone will be invited. * OLS activities Paper entitled "Memory Hotplug Redux"; presented by Joel Schopp and Memory Hotplug BOF led by Dave Hansen Note the schedule is finally out: Topics from the agenda: * Hotplug CPU test cases and web page. -Bryce's TODO list was posted just prior to the meeting. -Basically over the last month he has worked on getting the hardware working and set up for project access. The Power Systems are now in racks, the HMC is hooked up, and it should be networked soon. He was assigned an Itanium II system, however the OS on it was missing a lot and he is still trying to get it to boot with the Memory Patches in place (probably a config issue). -So now he is working with three architectures: IA32, PPC64, and IA64(Itanium II). He is able to boot IA32, and Power with the latest -mhp patches. The Itanium II is not booting, and, as noted above, that is probably not the fault of the patch. He will start posting the results of his boot tests on the test web site. -We discussed Bryce's psuedo code just completed. We determined that the code should represent a small set of code that can be repeated by a call from another script. Also, there will be an easy to use background activity, so that developers can run and setup a basic quick test easily. We will note subcases to run that involve repeated iterations and more intensive background activities to be run at OSDL, resources permitting. -Test case 1: need to check what CPUs are allowed in the bit mask, so we don't accidentally use a CPU that isn't allowed to do the I/O activity. Also may need to break out the IO activity from the script so we can vary what it is, but create a simple one for the quick and easy version. You might want to try to offline or on-line all available cpus when you iterate over this test. -Test case 5:we need to determine what tools beyond TOP need to be checked for regressions. SAR is one most likely. Do the utilities need to be consistent across architectures? For example, SAR reports zero interrupts for an off-lined cpu in one case, but the cpu disappears in the report in another case. This needs to be addressed with the community. * Update of patch fetching scripts. PLM scripts: Bryce has corrected some of the issues raised on the script reporting what is was and to respect the robot file. He still needs to throttle how often it runs (now every 4 hours, but they want to make that configurable). Also they want to trigger a test when the patch first appears, and he has the auto-detecting (of the new patch) part done. * Status of memory testing (hardware? boot w/ new patches?) As noted above, the latest memory patches are booting on IA36, PPC64, not on IA64 (probably due to a configuration issue). * Status of memory patches Martine reported that part of the sparse memory patches have actually been accepted in mainline. The extreme sparse memory patches are still being developed, but may eventually be used for all cases. Meantime the old code, discontig mem, is available in mainline, but likely will disappear eventually. * Status on CPU patches Natalie is working on fixing patches for ES700 for the latest mm kernel, for which things got worse not better. She started porting their patch. It works OK on a generic 8-way Intel. She hasn't tried it on x86-64 yet (in repair). There are many problems reported on 2.6.13-mm1, so those could be related to the problems she is having. Natalie is still trying to fix what she thinks is a scheduling problem with IA32. SGI is posting a debugger that she plans to use to step through the problem (there is no stack trace produced when it hangs on boot). * Use Case status: -Martine merged in Mary's comments for "Hotplug in Virtualization", and she pinged the other people for comments. Will post on the list soon. -Martine (and Mary) tried to contact Silvester with no response. Silvester had volunteered to do the use case on dynamic partitioning. No new business was raised. Next meeting, August 2. -- Mary Edie Meredith maryedie@xxxxxxxx 503-906-1942 Data Center Linux Initiative Manager Open Source Development Labs