Re: [RFC][PATCH 03/26] mm, mpol: add MPOL_MF_LAZY ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/06/2012 04:04 PM, Lee Schermerhorn wrote:
On Fri, 2012-07-06 at 12:38 -0400, Rik van Riel wrote:

4. Putting a lot of pages in the swap cache ends up allocating
     swap space. This means this NUMA migration scheme will only
     work on systems that have a substantial amount of memory
     represented by swap space. This is highly unlikely on systems
     with memory in the TB range. On smaller systems, it could drive
     the system out of memory (to the OOM killer), by "filling up"
     the overflow swap with migration pages instead.
5. In the long run, we want the ability to migrate transparent
     huge pages as one unit.  The reason is simple, the performance
     penalty for running on the wrong NUMA node (10-20%) is on the
     same order of magnitude as the performance penalty for running
     with 4kB pages instead of 2MB pages (5-15%).

     Breaking up large pages into small ones, and having khugepaged
     reconstitute them on a random NUMA node later on, will negate
     the performance benefits of both NUMA placement and THP.

When I originally posted the "migrate on fault" series, I posted a
separate series with a "migration cache" to avoid the use of swap space
for lazy migration: http://markmail.org/message/xgvvrnn2nk4nsn2e.

The migration cache was originally implemented by Marcello Tosatti for
the old memory hotplug project:
http://marc.info/?l=linux-mm&m=109779128211239&w=4.

The idea is that you don't need swap space for lazy migration, just an
"address_space" where you can park an anon VMA's pte's while they're
"unmapped" to cause migration faults.  Based on a suggestion from
Christoph Lameter, I had tried to hide the migration cache behind the
swap cache interface to minimize changes mainly in do_swap_page and
vmscan/reclaim.  It seemed to work, but the difference in reference
count semantics for the mig cache -- entry removed when last pte
migrated/mapped -- makes coordination with exit teardown, uh, tricky.

That fixes one of the two problems, but using _PTE_NUMA
or _PAGE_PROTNONE looks like it would be both easier,
and solve both.

--
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]