Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL

"C.Wehrmeyer" <c.wehrmeyer@xxxxxx> · Thu, 19 Oct 2017 09:34:51 +0200

I apologise in case this message is going to arrive multiple times at 
the mailing list. I've had connection problems this morning while trying 
to push it through regardless, but it might or might not have been sent 
properly. I'm sorry for the inconvenience.
On 2017-10-08 18:47 Mike Kravetz wrote:
You are correct.  That check in function vma_to_resize() will prevent
mremap from growing or relocating hugetlb backed mappings.  This check
existed in the 2.6.0 linux kernel, so this restriction has existed for
a very long time.  I'm guessing that growing or relocating a hugetlb
mapping was never allowed.  Perhaps the mremap man page should list this
restriction.
I do not see such mentioning:

http://man7.org/linux/man-pages/man2/mremap.2.html

The author(s) deliberately use the term "page aligned", without 
specifying the page size that was used creating the initial mapping. And 
even more:
mremap() uses the Linux page table scheme.  mremap() changes the
mapping between virtual addresses and memory pages.  This can be used
to implement a very efficient realloc(3).
There is not much of a very efficient realloc(3) left if you cannot 
modify mappings with a higher page size, is there?
Is there a specific use case where the ability to grow hugetlb mappings
is desired?  Adding this functionality would involve more than simply
removing the above if statement.  One area of concern would be hugetlb
huge page reservations.  If there is a compelling use case, adding the
functionality may be worth consideration.  If not, I suggest we just
document the limitation.
Paging was introduced to the x86 processor family with the 80386 in 
1985, with 4 KiBs per default. It's been 32 years since that, and modern 
CPUs in the consumer market have support for 2 MiB and 1 GiB pages, and 
yet default allocators usually just stick to the default without 
bothering whether or not there actually are hugepages available.
One 2-MiB page removes 512 4-KiB pages from the TLB, seeing as at least 
my TLBs are specialised in buffering one type of pages. I'm certain that 
at some point in the future the need for deliberately reserving 
hugepages via the kernel interface is going to be removed, and hugepages 
will become the usual way of allocating memory.
As for the specific use case: I've written my own allocator that is not 
bound on the same limitations that usual malloc/realloc/free allocators 
are bound. As such I want to be able to eliminate as many page walks as 
possible.
Just excepting the limitation would put Linux down on the same level as 
the Windows API, where no VirtualRealloc exists. My allocator needs to 
work with Linux and Windows; for the latter one I'm already managing a 
table of consecutive mappings in user-space that, if a relocation has to 
be made, creates an entirely new mapping into which the data of the 
previous mappings is copied. This is redundant, because the kernel and 
the process keep their own copies of the mapping table, and this is slow 
because the kernel could just re-adjust the position within the address 
space, whereas the process has to memcpy all the data from the old to 
the new mappings.
Those are the very problems mremap was supposed to remove in the first 
place. Making the limitation documented is the lazy way that will force 
implementers to workaround it.
As for any kind of speed penalty that this might introduce (because 
flags have to be checked, interfaces to be changed, and constants to be 
replaced): hugepages will also remove the need to allocate memory. My 
allocator just doesn't call the kernel each time it requires memory, but 
only when it is absolutely necessary. That necessity can be postponed 
the larger the mapping is that I can allocate in one go.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>