Re: [PATCH v2 0/5] Add movablecore_map boot option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2012-11-29 at 10:25 +0800, Jiang Liu wrote:
> On 2012-11-29 9:42, Jaegeuk Hanse wrote:
> > On Wed, Nov 28, 2012 at 04:47:42PM +0800, Jiang Liu wrote:
> >> Hi all,
> >> 	Seems it's a great chance to discuss about the memory hotplug feature
> >> within this thread. So I will try to give some high level thoughts about memory
> >> hotplug feature on x86/IA64. Any comments are welcomed!
> >> 	First of all, I think usability really matters. Ideally, memory hotplug
> >> feature should just work out of box, and we shouldn't expect administrators to 
> >> add several extra platform dependent parameters to enable memory hotplug. 
> >> But how to enable memory (or CPU/node) hotplug out of box? I think the key point
> >> is to cooperate with BIOS/ACPI/firmware/device management teams. 
> >> 	I still position memory hotplug as an advanced feature for high end 
> >> servers and those systems may/should provide some management interfaces to 
> >> configure CPU/memory/node hotplug features. The configuration UI may be provided
> >> by BIOS, BMC or centralized system management suite. Once administrator enables
> >> hotplug feature through those management UI, OS should support system device
> >> hotplug out of box. For example, HP SuperDome2 management suite provides interface
> >> to configure a node as floating node(hot-removable). And OpenSolaris supports
> >> CPU/memory hotplug out of box without any extra configurations. So we should
> >> shape interfaces between firmware and OS to better support system device hotplug.

Well described.  I agree with you.  I am also OK to have the boot option
for the time being, but we should be able to get the info from ACPI for
better TCE.

> >> 	On the other hand, I think there are no commercial available x86/IA64
> >> platforms with system device hotplug capabilities in the field yet, at least only
> >> limited quantity if any. So backward compatibility is not a big issue for us now.

HP SuperDome is IA64-based and supports node hotplug when running with
HP-UX.  It implements vendor-unique ACPI interface to describe movable
memory ranges.

> >> So I think it's doable to rely on firmware to provide better support for system
> >> device hotplug.
> >> 	Then what should be enhanced to better support system device hotplug?
> >>
> >> 1) ACPI specification should be enhanced to provide a static table to describe
> >> components with hotplug features, so OS could reserve special resources for
> >> hotplug at early boot stages. For example, to reserve enough CPU ids for CPU
> >> hot-add. Currently we guess maximum number of CPUs supported by the platform
> >> by counting CPU entries in APIC table, that's not reliable.

Right.  HP SuperDome implements vendor-unique ACPI interface for this as
well.  For Linux, it is nice to have a standard interface defined.

> >> 2) BIOS should implement SRAT, MPST and PMTT tables to better support memory
> >> hotplug. SRAT associates memory ranges with proximity domains with an extra
> >> "hotpluggable" flag. PMTT provides memory device topology information, such
> >> as "socket->memory controller->DIMM". MPST is used for memory power management
> >> and provides a way to associate memory ranges with memory devices in PMTT.
> >> With all information from SRAT, MPST and PMTT, OS could figure out hotplug
> >> memory ranges automatically, so no extra kernel parameters needed.

I agree that using SRAT is a good compromise.  The hotpluggable flag is
supposed to indicate the platform's capability, but could use for this
purpose until we have a better interface defined.

> >> 3) Enhance ACPICA to provide a method to scan static ACPI tables before
> >> memory subsystem has been initialized because OS need to access SRAT,
> >> MPST and PMTT when initializing memory subsystem.

I do not think this is an ACPICA issue.  HP-UX also uses ACPICA, and can
access ACPI tables and walk ACPI namespace during early boot-time.  This
is achieved by the acpi_os layer to use special early boot-time memory
allocator at early boot-time.  Therefore, boot-time and hot-add config
code are very consistent in HP-UX.

> >> 4) The last and the most important issue is how to minimize performance
> >> drop caused by memory hotplug. As proposed by this patchset, once we
> >> configure all memory of a NUMA node as movable, it essentially disable
> >> NUMA optimization of kernel memory allocation from that node. According
> >> to experience, that will cause huge performance drop. We have observed
> >> 10-30% performance drop with memory hotplug enabled. And on another
> >> OS the average performance drop caused by memory hotplug is about 10%.
> >> If we can't resolve the performance drop, memory hotplug is just a feature
> >> for demo:( With help from hardware, we do have some chances to reduce
> >> performance penalty caused by memory hotplug.
> >> 	As we know, Linux could migrate movable page, but can't migrate
> >> non-movable pages used by kernel/DMA etc. And the most hard part is how
> >> to deal with those unmovable pages when hot-removing a memory device.
> >> Now hardware has given us a hand with a technology named memory migration,
> >> which could transparently migrate memory between memory devices. There's
> >> no OS visible changes except NUMA topology before and after hardware memory
> >> migration.
> >> 	And if there are multiple memory devices within a NUMA node,
> >> we could configure some memory devices to host unmovable memory and the
> >> other to host movable memory. With this configuration, there won't be
> >> bigger performance drop because we have preserved all NUMA optimizations.
> >> We also could achieve memory hotplug remove by:
> >> 1) Use existing page migration mechanism to reclaim movable pages.
> >> 2) For memory devices hosting unmovable pages, we need:
> >> 2.1) find a movable memory device on other nodes with enough capacity
> >> and reclaim it.
> >> 2.2) use hardware migration technology to migrate unmovable memory to
> >> the just reclaimed memory device on other nodes.
>>>
> >> 	I hope we could expect users to adopt memory hotplug technology
> >> with all these implemented.
> >>
> >> 	Back to this patch, we could rely on the mechanism provided
> >> by it to automatically mark memory ranges as movable with information
> >>from ACPI SRAT/MPST/PMTT tables. So we don't need administrator to
> >> manually configure kernel parameters to enable memory hotplug.

Right.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]