Re: [RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm looking to fix this problem in my code, but am struggling to see how the
current code is safe. I'm thinking about the following scenario:


Let's see :)

  - A page is CoW mapped into processes A and B.
  - The page takes a fault in process A, and do_wp_page() determines that it is
    "maybe-shared" and therefore must copy. So drops the PTL and calls
    wp_page_copy().

Note that before calling wp_page_copy(), we do a folio_get(folio); Further, the page table reference is only dropped once we actually replace the page in the page table. So while in wp_page_copy(), the folio should have at least 2 references if the page is still mapped.

  - Process B exits.
  - Another thread in process A faults on the page. This time dw_wp_page()
    determines that the page is exclusive (due to the ref count), and reuses it,
    marking it exclusive along the way.

The refcount should not be 1 (other reference from the wp_page_copy() caller), so A won't be able to reuse it, and ...

  - wp_page_copy() from the original thread in process A retakes the PTL and
    copies the _now exclusive_ page.

Having typed it up, I guess this can't happen, because wp_page_copy() will only
do the copy if the PTE hasn't changed and it will have changed because it is now
writable? So this is safe?

this applies as well. If the pte changed (when reusing due to a write failt it's now writable, or someone else broke COW), we back off. For FAULT_FLAG_UNSHARE, however, the PTE may not change. But the additional reference should make it work.

I think it works as intended. It would be clearer if we'd also recheck in wp_page_copy() whether we still don't have an exclusive anon page under PT lock -- and if we would, back off.


To make things more convoluted, what happens if the second thread does an
mprotect() to make the page RO after its write fault was handled? I think
mprotect() will serialize on the mmap write lock so this is safe too?

Yes, mprotect() synchronizes that. There are other mechanisms to write-protect a page, though, under mmap lock in read mode (uffd-wp). So it's a valid concern.

In all of these cases, reuse should be prevented due to the additional reference on the folio when entering wp_page_copy() right from the start, not turning the page exclusive but instead replacing it by a copy. An additional sanity check sounds like the right thing to do.


Sorry if this is a bit rambly, just trying to make sure I've understood
everything correctly.

It's a very interesting corner case, thanks for bringing that up. I think the old mapcount based approach could have suffered from this theoretical issue, but I might be wrong.

--
Thanks,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux