On 19 Feb 2019, at 19:18, Mike Kravetz wrote:
On 2/19/19 6:33 PM, Zi Yan wrote:
On 19 Feb 2019, at 17:42, Mike Kravetz wrote:
On 2/15/19 2:08 PM, Zi Yan wrote:
Thanks for working on this issue!
I have not yet had a chance to take a look at the code. However, I
do have
some general questions/comments on the approach.
Thanks for replying. The code is very intrusive and has a lot of
hacks, so it is
OK for us to discuss the general idea first. :)
Patch structure
----
The patchset I developed to generate physically contiguous
memory/arbitrary
sized pages merely moves pages around. There are three components
in this
patchset:
1) a new page migration mechanism, called exchange pages, that
exchanges the
content of two in-use pages instead of performing two back-to-back
page
migration. It saves on overheads and avoids page reclaim and memory
compaction
in the page allocation path, although it is not strictly required
if enough
free memory is available in the system.
2) a new mechanism that utilizes both page migration and exchange
pages to
produce physically contiguous memory/arbitrary sized pages without
allocating
any new pages, unlike what khugepaged does. It works on per-VMA
basis, creating
physically contiguous memory out of each VMA, which is virtually
contiguous.
A simple range tree is used to ensure no two VMAs are overlapping
with each
other in the physical address space.
This appears to be a new approach to generating contiguous areas.
Previous
attempts had relied on finding a contiguous area that can then be
used for
various purposes including user mappings. Here, you take an
existing mapping
and make it contiguous. [RFC PATCH 04/31] mm: add mem_defrag
functionality
talks about creating a (VPN, PFN) anchor pair for each vma and then
using
this pair as the base for creating a contiguous area.
I'm curious, how 'fixed' is the anchor? As you know, there could be
a
non-movable page in the PFN range. As a result, you will not be
able to
create a contiguous area starting at that PFN. In such a case, do
we try
another PFN? I know this could result in much page shuffling. I'm
just
trying to figure out how we satisfy a user who really wants a
contiguous
area. Is there some method to keep trying?
Good question. The anchor is determined on a per-VMA basis, which can
be changed
easily,
but in this patchiest, I used a very simple strategy — making all
VMAs not
overlapping
in the physical address space to get maximum overall contiguity and
not changing
anchors
even if non-moveable pages are encountered when generating physically
contiguous
pages.
Basically, first VMA1 in the virtual address space has its anchor as
(VMA1_start_VPN, ZONE_start_PFN),
second VMA1 has its anchor as (VMA2_start_VPN, ZONE_start_PFN +
VMA1_size), and
so on.
This makes all VMA not overlapping in physical address space during
contiguous
memory
generation. When there is a non-moveable page, the anchor will not be
changed,
because
no matter whether we assign a new anchor or not, the contiguous pages
stops at
the non-moveable page. If we are trying to get a new anchor, more
effort is
needed to
avoid overlapping new anchor with existing contiguous pages. Any
overlapping will
nullify the existing contiguous pages.
To satisfy a user who wants a contiguous area with N pages, the
minimal distance
between
any two non-moveable pages should be bigger than N pages in the
system memory.
Otherwise,
nothing would work. If there is such an area (PFN1, PFN1+N) in the
physical
address space,
you can set the anchor to (VPN_USER, PFN1) and use exchange_pages()
to generate
a contiguous
area with N pages. Instead, alloc_contig_pages(PFN1, PFN1+N, …)
could also work,
but
only at page allocation time. It also requires the system has N free
pages when
alloc_contig_pages() are migrating the pages in (PFN1, PFN1+N) away,
or you need
to swap
pages to make the space.
Let me know if this makes sense to you.
Yes, that is how I expected the implementation would work. Thank you.
Another high level question. One of the benefits of this approach is
that exchanging pages does not require N free pages as you describe
above. This assumes that the vma which we are trying to make
contiguous
is already populated. If it is not populated, then you also need to
have N free pages. Correct? If this is true, then is the expected
use
case to first populate a vma, and then try to make contiguous? I
would
assume that if it is not populated and a request to make contiguous is
given, we should try to allocate/populate the vma with contiguous
pages
at that time?
Yes, I assume the pages within the VMA are already populated but not
contiguous yet.
My approach considers memory contiguity as an on-demand resource. In
some phases
of an application, accelerators or RDMA controllers would
process/transfer data in one
or more VMAs, at which time contiguous memory can help reduce address
translation
overheads or lift certain constraints. And different VMAs could be
processed at
different program phases, thus it might be hard to get contiguous memory
for all
these VMAs at the allocation time using alloc_contig_pages(). My
approach can
help get contiguous memory later, when the demand comes.
For some cases, you definitely can use alloc_contig_pages() to give
users
a contiguous area at page allocation time, if you know the user is going
to use this
area for accelerator data processing or as a RDMA buffer and the area
size is fixed.
In addition, we can also use khugepaged approach, having a daemon
periodically
scan VMAs and use alloc_contig_pages() to convert non-contiguous pages
in a VMA
to contiguous pages, but it would require N free pages during the
conversion.
In sum, my approach complements alloc_contig_pages() and provides more
flexibility.
It is not trying to replaces alloc_contig_pages().
--
Best Regards,
Yan Zi