Re: [Question] CoW on VM_PFNMAP vma during write fault

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27.02.24 13:28, Wupeng Ma wrote:
We find that a warn will be produced during our test, the detail log is
shown in the end.

The core problem of this warn is that the first pfn of this pfnmap vma is
cleared during memory-failure. Digging into the source we find that this
problem can be triggered as following:

// mmap with MAP_PRIVATE and specific fd which hook mmap
mmap(MAP_PRIVATE, fd)
   __mmap_region
     remap_pfn_range
     // set vma with pfnmap and the prot of pte is read only
	

Okay, so we get a MAP_PRIVATE VM_PFNMAP I assume.

What fd is that exactly? Often, we disallow private mappings in the mmap() callback (for a good reason).

// memset this memory with trigger fault
handle_mm_fault
   __handle_mm_fault
     handle_pte_fault
       // write fault and !pte_write(entry)
       do_wp_page
         wp_page_copy // this will alloc a new page with valid page struct
                      // for this pfnmap vma

Here we replace the mapped PFNMAP thingy by a proper anon folio.


// inject a hwpoison to the first page of this vma

I assume this is an anon folio?

madvise_inject_error
   memory_failure
     hwpoison_user_mappings
       try_to_unmap_one
         // mark this pte as invalid (hwpoison)
         mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
                 address, range.end);

// during unmap vma, the first pfn of this pfnmap vma is invalid
vm_mmap_pgoff
   do_mmap
     __do_mmap_mm
       __mmap_region
         __do_munmap
           unmap_region
             unmap_vmas
               unmap_single_vma
                 untrack_pfn
                   follow_phys // pte is already invalidate, WARN_ON here

unmap_single_vma()->...->zap_pte_range() should do the right thing when calling vm_normal_page().

untrack_pfn() is the problematic part.


CoW with a valid page for pfnmap vma is weird to us. Can we use
remap_pfn_range for private vma(read only)? Once CoW happens on a pfnmap
vma during write fault, this page is normal(page flag is valid) for most mm
subsystems, such as memory failure in thais case and extra should be done to
handle this special page.

During unmap, if this vma is pfnmap, unmap shouldn't be done since page
should not be touched for pfnmap vma.

But the root problem is that can we insert a valid page for pfnmap vma?

Any thoughts to solve this warn?

vm_normal_page() documentation explains how that magic is supposed to work. vm_normal_page() should be able to correctly identify whether we want to look at the struct page for an anon folio that was COWed.


untrack_pfn() indeed does not seem to be well prepared for handling MAP_PRIVATE mappings where we end up having anon folios.

I think it will already *completely mess up* simply when unmapping the range without the memory failure involved.

See, follow_phys() would get the PFN of the anon folio and then untrack_pfn() would do some nonesense with that. Completely broken.

The WARN is just a side-effect of the brokenness.

In follow_phys(), we'd likely have to call vm_normal_page(). If we get a page back, we'd likely have to fail follow_phys() instead of returning a PFN of an anon folio.

Now, how do we fix untrack_pfn() ? I really don't know. In theory, we might no longer have *any* PFNMAP PFN in there after COW'ing everything.

Sounds like MAP_PRIVATE VM_PFNMAP + __HAVE_PFNMAP_TRACKING is some broken garbage (sorry). Can we disallow it?

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux