Re: [PATCH] mm: fix hugetlb page unmap count balance issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17.05.23 00:34, Mike Kravetz wrote:
On 05/15/23 10:04, Mike Kravetz wrote:
On 05/12/23 16:29, Mike Kravetz wrote:
On 05/12/23 14:26, James Houghton wrote:
On Fri, May 12, 2023 at 12:20 AM Junxiao Chang <junxiao.chang@xxxxxxxxx> wrote:

This alone doesn't fix mapcounting for PTE-mapped HugeTLB pages. You
need something like [1]. I can resend it if that's what we should be
doing, but this mapcounting scheme doesn't work when the page structs
have been freed.

It seems like it was a mistake to include support for hugetlb memfds in udmabuf.

IIUC, it was added with commit 16c243e99d33 udmabuf: Add support for mapping
hugepages (v4).  Looks like it was never sent to linux-mm?  That is unfortunate
as hugetlb vmemmap freeing went in at about the same time.  And, as you have
noted udmabuf will not work if hugetlb vmemmap freeing is enabled.

Sigh!

Trying to think of a way forward.
--
Mike Kravetz


[1]: https://lore.kernel.org/linux-mm/20230306230004.1387007-2-jthoughton@xxxxxxxxxx/

- James

Adding people and list on Cc: involved with commit 16c243e99d33.

There are several issues with trying to map tail pages of hugetllb pages
not taken into account with udmabuf.  James spent quite a bit of time trying
to understand and address all the issues with the HGM code.  While using
the scheme proposed by James, may be an approach to the mapcount issue there
are also other issues that need attention.  For example, I do not see how
the fault code checks the state of the hugetlb page (such as poison) as none
of that state is carried in tail pages.

The more I think about it, the more I think udmabuf should treat hugetlb
pages as hugetlb pages.  They should be mapped at the appropriate level
in the page table.  Of course, this would impose new restrictions on the
API (mmap and ioctl) that may break existing users.  I have no idea how
extensively udmabuf is being used with hugetlb mappings.

Verified that using udmabug on a hugetlb mapping with vmemmap optimization will
BUG as:

[14106.812312] BUG: unable to handle page fault for address: ffffea000a7c4030
[14106.813704] #PF: supervisor write access in kernel mode
[14106.814791] #PF: error_code(0x0003) - permissions violation
[14106.815921] PGD 27fff9067 P4D 27fff9067 PUD 27fff8067 PMD 17ec34067 PTE 8000000285dab021
[14106.818489] Oops: 0003 [#1] PREEMPT SMP PTI
[14106.819345] CPU: 2 PID: 2313 Comm: udmabuf Not tainted 6.4.0-rc1-next-20230508+ #44
[14106.820906] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[14106.822679] RIP: 0010:page_add_file_rmap+0x2e/0x270

I started looking more closely at the driver and I do not fully understand the
usage model.  I took clues from the selftest and driver.  It seems the first
step is to create a buffer via the UDMABUF_CREATE ioctl.  This will copy 4K
pages from the page cache to an array associated with a file.  I did note that
hugetlb and shm behavior is different here as the driver can not add missing
hugetlb pages to the cache as it does with shm.  However, what seems more
concerning is that there is nothing to prevent the pages from being replaced
in the cache before being added to a udmabuf mapping.  This means udmabuf
mapping and original memfd could be operating on a different set of pages.
Is this acceptable, or somehow prevented?

In my role, I am more interested in udmabuf handling of hugetlb pages.
Trying to use individual 4K pages of hugetlb pages is something that
should be avoided here.  Would it be acceptable to change code so that
only whole hugetlb pages are used by udmabuf?  If not, then perhaps the
existing hugetlb support can be removed?

I'm wondering if that VMA shouldn't be some kind of special mapping (VM_PFNMAP), such that the struct page is entirely ignored?

I'm quite confused and concerned when I read that code (what the hell is it doing with shmem/hugetlb pages? why does the mapcount even get adjusted?)

This all has a bad smell to it, I hope I'm missing something important ...

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux