I apologise in case this message is going to arrive multiple times at
the mailing list. I've had connection problems this morning while trying
to push it through regardless, but it might or might not have been sent
properly. I'm sorry for the inconvenience.
On 2017-10-08 18:47 Mike Kravetz wrote:
You are correct. That check in function vma_to_resize() will prevent
mremap from growing or relocating hugetlb backed mappings. This check
existed in the 2.6.0 linux kernel, so this restriction has existed for
a very long time. I'm guessing that growing or relocating a hugetlb
mapping was never allowed. Perhaps the mremap man page should list this
restriction.
I do not see such mentioning:
http://man7.org/linux/man-pages/man2/mremap.2.html
The author(s) deliberately use the term "page aligned", without
specifying the page size that was used creating the initial mapping. And
even more:
mremap() uses the Linux page table scheme. mremap() changes the
mapping between virtual addresses and memory pages. This can be used
to implement a very efficient realloc(3).
There is not much of a very efficient realloc(3) left if you cannot
modify mappings with a higher page size, is there?
Is there a specific use case where the ability to grow hugetlb mappings
is desired? Adding this functionality would involve more than simply
removing the above if statement. One area of concern would be hugetlb
huge page reservations. If there is a compelling use case, adding the
functionality may be worth consideration. If not, I suggest we just
document the limitation.
Paging was introduced to the x86 processor family with the 80386 in
1985, with 4 KiBs per default. It's been 32 years since that, and modern
CPUs in the consumer market have support for 2 MiB and 1 GiB pages, and
yet default allocators usually just stick to the default without
bothering whether or not there actually are hugepages available.
One 2-MiB page removes 512 4-KiB pages from the TLB, seeing as at least
my TLBs are specialised in buffering one type of pages. I'm certain that
at some point in the future the need for deliberately reserving
hugepages via the kernel interface is going to be removed, and hugepages
will become the usual way of allocating memory.
As for the specific use case: I've written my own allocator that is not
bound on the same limitations that usual malloc/realloc/free allocators
are bound. As such I want to be able to eliminate as many page walks as
possible.
Just excepting the limitation would put Linux down on the same level as
the Windows API, where no VirtualRealloc exists. My allocator needs to
work with Linux and Windows; for the latter one I'm already managing a
table of consecutive mappings in user-space that, if a relocation has to
be made, creates an entirely new mapping into which the data of the
previous mappings is copied. This is redundant, because the kernel and
the process keep their own copies of the mapping table, and this is slow
because the kernel could just re-adjust the position within the address
space, whereas the process has to memcpy all the data from the old to
the new mappings.
Those are the very problems mremap was supposed to remove in the first
place. Making the limitation documented is the lazy way that will force
implementers to workaround it.
As for any kind of speed penalty that this might introduce (because
flags have to be checked, interfaces to be changed, and constants to be
replaced): hugepages will also remove the need to allocate memory. My
allocator just doesn't call the kernel each time it requires memory, but
only when it is absolutely necessary. That necessity can be postponed
the larger the mapping is that I can allocate in one go.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>