Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I apologise in case this message is going to arrive multiple times at the mailing list. I've had connection problems this morning while trying to push it through regardless, but it might or might not have been sent properly. I'm sorry for the inconvenience.

On 2017-10-08 18:47 Mike Kravetz wrote:
You are correct.  That check in function vma_to_resize() will prevent
mremap from growing or relocating hugetlb backed mappings.  This check
existed in the 2.6.0 linux kernel, so this restriction has existed for
a very long time.  I'm guessing that growing or relocating a hugetlb
mapping was never allowed.  Perhaps the mremap man page should list this
restriction.

I do not see such mentioning:

http://man7.org/linux/man-pages/man2/mremap.2.html

The author(s) deliberately use the term "page aligned", without specifying the page size that was used creating the initial mapping. And even more:

mremap() uses the Linux page table scheme.  mremap() changes the
mapping between virtual addresses and memory pages.  This can be used
to implement a very efficient realloc(3).

There is not much of a very efficient realloc(3) left if you cannot modify mappings with a higher page size, is there?

Is there a specific use case where the ability to grow hugetlb mappings
is desired?  Adding this functionality would involve more than simply
removing the above if statement.  One area of concern would be hugetlb
huge page reservations.  If there is a compelling use case, adding the
functionality may be worth consideration.  If not, I suggest we just
document the limitation.

Paging was introduced to the x86 processor family with the 80386 in 1985, with 4 KiBs per default. It's been 32 years since that, and modern CPUs in the consumer market have support for 2 MiB and 1 GiB pages, and yet default allocators usually just stick to the default without bothering whether or not there actually are hugepages available.

One 2-MiB page removes 512 4-KiB pages from the TLB, seeing as at least my TLBs are specialised in buffering one type of pages. I'm certain that at some point in the future the need for deliberately reserving hugepages via the kernel interface is going to be removed, and hugepages will become the usual way of allocating memory.

As for the specific use case: I've written my own allocator that is not bound on the same limitations that usual malloc/realloc/free allocators are bound. As such I want to be able to eliminate as many page walks as possible.

Just excepting the limitation would put Linux down on the same level as the Windows API, where no VirtualRealloc exists. My allocator needs to work with Linux and Windows; for the latter one I'm already managing a table of consecutive mappings in user-space that, if a relocation has to be made, creates an entirely new mapping into which the data of the previous mappings is copied. This is redundant, because the kernel and the process keep their own copies of the mapping table, and this is slow because the kernel could just re-adjust the position within the address space, whereas the process has to memcpy all the data from the old to the new mappings.

Those are the very problems mremap was supposed to remove in the first place. Making the limitation documented is the lazy way that will force implementers to workaround it.

As for any kind of speed penalty that this might introduce (because flags have to be checked, interfaces to be changed, and constants to be replaced): hugepages will also remove the need to allocate memory. My allocator just doesn't call the kernel each time it requires memory, but only when it is absolutely necessary. That necessity can be postponed the larger the mapping is that I can allocate in one go.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux