Re: [RFC PATCH] mm, oom: introduce vm.sacrifice_hugepage_on_oom

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Tue, 16 Feb 2021 14:30:15 -0800

On 2/16/21 12:12 AM, Michal Hocko wrote:
> On Tue 16-02-21 03:07:13, Eiichi Tsukata wrote:
>> Hugepages can be preallocated to avoid unpredictable allocation latency.
>> If we run into 4k page shortage, the kernel can trigger OOM even though
>> there were free hugepages. When OOM is triggered by user address page
>> fault handler, we can use oom notifier to free hugepages in user space
>> but if it's triggered by memory allocation for kernel, there is no way
>> to synchronously handle it in user space.
> 
> Can you expand some more on what kind of problem do you see?
> Hugetlb pages are, by definition, a preallocated, unreclaimable and
> admin controlled pool of pages. Under those conditions it is expected
> and required that the sizing would be done very carefully. Why is that a
> problem in your particular setup/scenario?
> 
> If the sizing is really done properly and then a random process can
> trigger OOM then this can lead to malfunctioning of those workloads
> which do depend on hugetlb pool, right? So isn't this a kinda DoS
> scenario?

I spent a bunch of time last year looking at OOMs or near OOMs onsystems
where there were a bunch of free hugetlb pages.  The number of hugetlb pages
was carefully chosen by the DB for it's expected needs.  Some other services
running on the system were actually driving/causing the OOM situations.
If a feature like this was in place, it could have caused a DOS scenario
as Michal sugested.

However, this is an 'opt in' feature.  So, I would not expect anyone who
carefully plans the size of their hugetlb pool to enable such a feature.
If there is a use case where hugetlb pages are used in a non-essential
application, this might be of use.
-- 
Mike Kravetz