On 17.08.21 18:19, Mike Kravetz wrote:
On 8/17/21 12:30 AM, David Hildenbrand wrote:
On 17.08.21 03:46, Andrew Morton wrote:
On Mon, 16 Aug 2021 17:46:58 -0700 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
It really is a ton of new code. I think we're owed much more detail
about the problem than the above. To be confident that all this
material is truly justified?
The desired functionality for this specific use case is to simply
convert a 1G huegtlb page to 512 2MB hugetlb pages. As mentioned
"Converting larger to smaller hugetlb pages can be accomplished today by
first freeing the larger page to the buddy allocator and then allocating
the smaller pages. However, there are two issues with this approach:
1) This process can take quite some time, especially if allocation of
the smaller pages is not immediate and requires migration/compaction.
2) There is no guarantee that the total size of smaller pages allocated
will match the size of the larger page which was freed. This is
because the area freed by the larger page could quickly be
fragmented."
These two issues have been experienced in practice.
Well the first issue is quantifiable. What is "some time"? If it's
people trying to get a 5% speedup on a rare operation because hey,
bugging the kernel developers doesn't cost me anything then perhaps we
have better things to be doing.
And the second problem would benefit from some words to help us
understand how much real-world hurt this causes, and how frequently.
And let's understand what the userspace workarounds look like, etc.
A big chunk of the code changes (aprox 50%) is for the vmemmap
optimizations. This is also the most complex part of the changes.
I added the code as interaction with vmemmap reduction was discussed
during the RFC. It is only a performance enhancement and honestly
may not be worth the cost/risk. I will get some numbers to measure
the actual benefit.
If it really makes that much of a difference code/complexity wise, would it make sense to just limit denote functionality to the !vmemmap case for now?
Handling vmemmap optimized huge pages is not that big of a deal. We
just use the existing functionality to populate vmemmap for the page
being demoted, and free vmemmap for resulting pages of demoted size.
This obviously is not 'optimal' for demote as we will allocate more
vmemmap pages than needed and then free the excess pages. The complex
part is not over allocating vmemmap and only sparsely populating vmemmap
for the target pages of demote size. This is all done in patches 6-8.
I am happy to drop these patches for now. The are the most complex (and
ugly) of this series. As mentioned, they do not provide any additional
functionality.
Just looking at the diffstat, that looks like a good idea to me :)
--
Thanks,
David / dhildenb