Re: [RFC PATCH 0/3] Add hugetlb MADV_DONTNEED support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/27/22 03:57, David Hildenbrand wrote:
> On 13.01.22 19:03, Mike Kravetz wrote:
>> Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP
>> testing.  However, mremap support was recently added in commit
>> 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed
>> vma").  While attempting to enable mremap support in the test, it was
>> discovered that the mremap test indirectly depends on MADV_DONTNEED.
>>
>> hugetlb does not support MADV_DONTNEED.  However, the only thing
>> preventing support is a check in can_madv_lru_vma().  Simply removing
>> the check will enable support.
>>
>> This is sent as a RFC because there is no existing use case calling
>> for hugetlb MADV_DONTNEED support except possibly the userfaultfd test.
>> However, adding support makes sense as it is fairly trivial and brings
>> hugetlb functionality more in line with 'normal' memory.
>>
> 
> Just a note:
> 
> QEMU doesn't use huge anonymous memory directly (MAP_ANON | MAP_HUGE...)
> but instead always goes either via hugetlbfs or via memfd. 
> 
> For MAP_PRIVATE hugetlb mappings, fallocate(FALLOC_FL_PUNCH_HOLE) seems
> to get the job done (IOW: also discards private anon pages). See the
> comments in the QEMU code below. I remember that that is somewhat
> inconsistent. For ordinary MAP_PRIVATE mapped files I remember that we
> always need fallocate(FALLOC_FL_PUNCH_HOLE) + madvise(QEMU_MADV_DONTNEED)
> to make sure
> 
> a) All file pages are removed
> b) All private anon pages are removed
> 
> IIRC hugetlbfs really is different in that regard, but maybe other fs
> behave similarly.

Yes it is really different.  And, some might even consider that a bug?
Imagine if those private anon (COW) pages contain important data.  They
could be unmapped/freed by some other process that has write access to
the hugetlb file on which the private mapping is based.

I believe this same issue exists for hugetlbfs ftruncate.  When fallocate
hole punch support was added, it was based on the ftruncate functionality.

I am hesitant to change the behavior of hugetlb hole punch or truncate
as people may be relying on that behavior today.  Your QEMU example is
one such case.

Thanks,
-- 
Mike Kravetz




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux