Hi, On Wednesday, November 07, 2012 12:18:07 AM Jiang Liu wrote: > Hi Bjorn, > Thanks for your review and please refer to inlined comments below. > > On 11/06/2012 05:05 AM, Bjorn Helgaas wrote: > > On Sat, Nov 3, 2012 at 10:07 AM, Jiang Liu <liuj97@xxxxxxxxx> wrote: > >> Modern high-end servers may support advanced RAS features, such as > >> system device dynamic reconfiguration. On x86 and IA64 platforms, > >> system device means processor(CPU), memory device, PCI host bridge > >> and even computer node. > >> > >> The ACPI specifications have provided standard interfaces between > >> firmware and OS to support device dynamic reconfiguraiton at runtime. > >> This patch series introduces a new framework for system device > >> dynamic reconfiguration based on ACPI specification, which will > >> replace current existing system device hotplug logic embedded in > >> ACPI processor/memory/container device drivers. > >> > >> The new ACPI based hotplug framework is modelled after the PCI hotplug > >> architecture and target to achieve following goals: > >> 1) Optimize device configuration order to achieve best performance for > >> hot-added system devices. For best perforamnce, system device should > >> be configured in order of memory -> CPU -> IOAPIC/IOMMU -> PCI HB. > >> 2) Resolve dependencies among hotplug slots. You need first to remove > >> the memory device before removing a physical processor if a > >> hotpluggable memory device is connected to a hotpluggable physical > >> processor. > > > > Doesn't the namespace already have a way to communicate these dependencies? > The namespace could could resolve most dependency issues, but there are still > several corner cases need special care. > 1) On a typical Intel Nehalem/Westmere platform, an IOH will be connected to > two physical processors through QPI. The IOH depends on the two processors. > And the ACPI namespace is something like: > /_SB > |_SCK0 > |_SCK1 > |_PCI1 > 2) For a large system composed up of multiple computer nodes, nodes may have > dependency on neighbors due to interconnect topology constraints. > > So we need to resolve dependency by both evaluating _EDL and analyze ACPI > namespace topology. Well, this doesn't explain why we need a new framework. > >> 3) Provide interface to cancel ongoing hotplug operations. It may take > >> a very long time to remove a memory device, so provide interface to > >> cancel the inprogress hotplug operations. > >> 4) Support new advanced RAS features, such as socket/memory migration. > >> 5) Provide better user interfaces to access the hotplug functionalities. > >> 6) Provide a mechanism to detect hotplug slots by checking existence > >> of ACPI _EJ0 method or by other hardware platform specific methods. > > > > I don't know what "hotplug slot" means for ACPI. ACPI allows hotplug > > of arbitrary devices in the namespace, whether they have EJ0 or not. > Here "hotplug" slot is an abstraction of receptacles where a group of > system devices could be attached to, or where we could control a group > of system devices. It's totally conceptual, may or may not has > corresponding physical slots. Can that be called something different from "slot", then, to avoid confusion? > For example, > 1) a hotplug slot for a hotpluggable memory board has a physical slot. So let's call that a "slot" and the abstraction above a "hotplug domain" or something similar. Because in fact we're talking about hotplug domains, aren't we? > 2) a hotplug slot for a non-hotpluggable processor with power control > capability has no physical slot. (That means you may power on/off a > physical processor but can't hotplug it at runtime). This case is useful > for hardware partitioning. People have been working on this particular thing for years, so I wonder why you think that your apprach is going to be better here? > Detecting hotplug slots by checking existence of _EJ0 is the default > but unreliable way. For a real high-end server with system device > hotplug capabilities should provide some static ACPI table to describe > hotplug slots/capabilities. There are some ongoing efforts for that from > Intel, but not in the public domain yet. So the hotplug slot enumeration > driver is designed to extensible:) > > >> 7) Unify the way to enumerate ACPI based hotplug slots. All hotplug > >> slots will be enumerated by the enumeration driver (acpihp_slot), > >> instead of by individual ACPI device drivers. > > > > Why do we need to enumerate these "slots" specifically? > > > > I think this patch adds things in /sys. It might help if you > > described what they are. > There's no standard way in ACPI5.0 to describe system device hotplug slots yet. > And we want to show user the system device hotplug capabilities even when there > is no device attached to a slot. In other word, user could now how much > devices they could connect to the system by hotplugging. Bjorn probably meant "provide documentation describing the user space interfaces being introduced". Which in fact is a requirement. > >> 8) Unify the way to handle ACPI hotplug events. All ACPI hotplug events > >> for system devices will be handled by a generic ACPI hotplug driver > >> (acpihp_drv) instead of by individual ACPI device drivers. > >> 9) Provide better error handling and error recovery. > >> 10) Trigger hotplug events/operations by software. This feature is useful > >> for hardware fault management and/or power saving. > >> > >> The new framework is composed up of three major components: > >> 1) A system device hotplug slot enumerator driver, which enumerates > >> hotplug slots in the system and provides platform specific methods > >> to control those slots. > >> 2) A system device hotplug driver, which is a platform independent > >> driver to manage all hotplug slots created by the slot enumerator. > >> The hotplug driver implements a state machine for hotplug slots and > >> provides user interfaces to manage hotplug slots. > >> 3) Several ACPI device drivers to configure/unconfigure system devices > >> at runtime. > >> > >> To get rid of inter dependengcy between the slot enumerator and hotplug > >> driver, common code shared by them will be built into the kernel. The > >> shared code provides some helper routines and a device class named > >> acpihp_slot_class with following default sysfs properties: > >> capabilities: RAS capabilities of the hotplug slot > >> state: current state of the hotplug slot state machine > >> status: current health status of the hotplug slot > >> object: ACPI object corresponding to the hotplug slot > >> > >> Signed-off-by: Jiang Liu <jiang.liu@xxxxxxxxxx> > >> Signed-off-by: Gaohuai Han <hangaohuai@xxxxxxxxxx> > > > > ... > >> +static char *acpihp_dev_mem_ids[] = { > >> + "PNP0C80", > >> + NULL > >> +}; > >> + > >> +static char *acpihp_dev_pcihb_ids[] = { > >> + "PNP0A03", > >> + NULL > >> +}; > > > > Why should this driver need to know about these PNP IDs? We ought to > > be able to support hotplug of any device in the namespace, no matter > > what its ID. > We need PNP IDs for: > 1) Give a meaningful name for each slot. > lrwxrwxrwx CPU00 -> ../../../devices/LNXSYSTM:00/acpihp/CPU00 > lrwxrwxrwx CPU01 -> ../../../devices/LNXSYSTM:00/acpihp/CPU01 > lrwxrwxrwx CPU02 -> ../../../devices/LNXSYSTM:00/acpihp/CPU02 > lrwxrwxrwx CPU03 -> ../../../devices/LNXSYSTM:00/acpihp/CPU03 > lrwxrwxrwx IOX01 -> ../../../devices/LNXSYSTM:00/acpihp/IOX01 > lrwxrwxrwx MEM00 -> ../../../devices/LNXSYSTM:00/acpihp/CPU00/MEM00 > lrwxrwxrwx MEM01 -> ../../../devices/LNXSYSTM:00/acpihp/CPU00/MEM01 > lrwxrwxrwx MEM02 -> ../../../devices/LNXSYSTM:00/acpihp/CPU01/MEM02 > lrwxrwxrwx MEM03 -> ../../../devices/LNXSYSTM:00/acpihp/CPU01/MEM03 > lrwxrwxrwx MEM04 -> ../../../devices/LNXSYSTM:00/acpihp/CPU02/MEM04 > lrwxrwxrwx MEM05 -> ../../../devices/LNXSYSTM:00/acpihp/CPU02/MEM05 > lrwxrwxrwx MEM06 -> ../../../devices/LNXSYSTM:00/acpihp/CPU03/MEM06 > lrwxrwxrwx MEM07 -> ../../../devices/LNXSYSTM:00/acpihp/CPU03/MEM07 > > 2) Classify system device into groups according to device types, so we could > configure/unconfigure them in optimal order for performance as: > memory -> CPU -> IOAPIC -> PCI host bridge > > 3) The new hotplug framework are designed to handle system device hotplug, > and it won't hand IO device hotplug such as PCI etc. So it need to stop > scanning subtree of PCI host bridges. Well, we probably need a hotplug domains framework, which the thing you're proposing seems to be. However, the question is: Why should it cover "system devices" only? To me, it looks like such a framework should cover all hotplug devices in the system, or at least all ACPI-based hotplug devices. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html