Theses are the minutes from the Hotplug SIG con call/F2F meeting held 01/20/2005. Attendees: AMD Kyle McDonald Bull George Mann EMC Ric Wheeler HP Martine Silbermann IBM Gerrit Huizenga, Badari Pulavarty, Joel Schopp Intel Mats Wichmann Novell Clyde Griffin, Alan Clark NTT Yoshifumi Manabe OSDL Lynn de la Torre, Mary Edie Meredith, Mark Wong, Craig Thomas Unisys Bruce Vessey * Presentation to the DCL Technical Working Group: Martine gave a presentation on the status of the Hotplug SIG in terms of current projects, issues, needs and goals for the upcoming year. This presentation can be viewed in both .ppt and .sxi formats on our hotplug SIG webpage http://developer.osdl.org/maryedie/HOTPLUG/docs .....will be posted shortly if it's not there yet :-) * Memory Hotplug testing: - As most of you following this list have noticed, Mark's attempts to boot his system w/ new memory hotplug patches and/or to run specific memory tests on them have brought to light several issues w/ those patches. Those issues were also posted to the sourceforge memory hotplug list and were addressed by Dave Hansen. One of the needs that we (the SIG) have in regards to those tests is that we need some explanation of what the test is exercising and what the expected results should be. - At the Hotplug BOF at LWE the topic of defining milestones in the implementation of memory removal will be addressed. Suggestions were made to create a table that would outline the different milestones, how can we tests against them and what we need to achieve those milestones and to achieve accurate and efficient testing. - Submit memory hotplug patches in small increments into the mainline: To help Dave Hansen achieve this goal we propose to run a battery of tests through STP on the small patches that he plans to submit to Andrew. Each small patch will undergo a different set of tests depending on its content. Suggestion was made to test those patches both against the mainline and the mm tree. AR: Mary will generate a matrix of tests that will match various issues (like scheduler changes) with specific tests so we know what tests to optimally use with each patch. * CPU Hotplug testing: The current CPU test scripts that we use do only on and offline CPUs, no further testing is done to verify if for example the interrupts were retargeted properly to a new CPU and to make sure no interrupt was lost. We had an extensive discussion on how to verify that a CPU has actually been taken offline at the right time without disruption or loss in handling of the interrupts. The goal is to verify proper rerouting of interrupt without writing tests that would be platform specific. Several solutions were proposed, amongst them are: - for ppc64 use DLPAR to make sure the CPU has changed partition and interrupts have migrated properly. This is obviously an architecture specific solution but it's a great "first test" to validate the code. - run a processor intensive work load and monitor the change in work load and performance after a CPU gets taken off. - use utilities that monitor the system (currently such utilities may or may not be able to handle a CPU being taken offline, TBD) * CPU Hotplug Documentation: A request was made to put the current draft on the SIG homepage. AR: Mary will update the page. *Hotplug at the OpenHPI/IPMI level: It was suggested that one of the first goals for the SIG to help the community in general understand OpenHPI and where it fits would be to publish a "status report" including information such as what platforms are supported, what is the future roadmap, etc... * Hardware issues: - The system donated is currently in the lab but hasn't been successfully booted yet. Once this is accomplished, Joel has volunteered to help determining if the adequate firmware to support hotplugging is installed and to get the system properly configured. - We should also run the regression tests on other architectures such as ia64, x86_64, etc... We need to bring that issue to Tom Hanrahan to discuss availability of such hw at the OSDL lab. * PLM issues: It seems that there's a glitch in the current version of PLM and that the error report is not describing what the source of the error is (for warnings it works fine). AR: Craig Thomas will look into it and report back to us. * Other issues: - Ric from EMC inquired about support for hotplug Serial ATA - is that a P1? (currently it's a P2 item). This will become very attractive to data centers and therefore should be addressed. - Need for a product manager to help the memory hotplug project? Create a progress roadmap that clearly states features, capabilities, time to market, etc. - Co-authoring a paper w/ Joel Schopp on "Memory Hotplug Redux" for OLS 2005. Martine is considering participating, anyone is invited to join. * Next meeting: The next scheduled meeting on 02/01 will be pushed out by one week because it conflicts with the OSDL Enterprise Linux Summit 2005 which a few of us are attending. So our next con call will be 02/08 at 11:00am Pacific. (In case you're wondering I didn't want to cancel it because the next scheduled meeting is the week of LWE in Boston and hopefully quite a few of us will attend the Hotplug BOF). I'll send a reminder shortly before the meeting. Thanks for your participation. Martine J. Silbermann