I'm looking to fix this problem in my code, but am struggling to see how the current code is safe. I'm thinking about the following scenario:
Let's see :)
- A page is CoW mapped into processes A and B. - The page takes a fault in process A, and do_wp_page() determines that it is "maybe-shared" and therefore must copy. So drops the PTL and calls wp_page_copy().
Note that before calling wp_page_copy(), we do a folio_get(folio); Further, the page table reference is only dropped once we actually replace the page in the page table. So while in wp_page_copy(), the folio should have at least 2 references if the page is still mapped.
- Process B exits. - Another thread in process A faults on the page. This time dw_wp_page() determines that the page is exclusive (due to the ref count), and reuses it, marking it exclusive along the way.
The refcount should not be 1 (other reference from the wp_page_copy() caller), so A won't be able to reuse it, and ...
- wp_page_copy() from the original thread in process A retakes the PTL and copies the _now exclusive_ page. Having typed it up, I guess this can't happen, because wp_page_copy() will only do the copy if the PTE hasn't changed and it will have changed because it is now writable? So this is safe?
this applies as well. If the pte changed (when reusing due to a write failt it's now writable, or someone else broke COW), we back off. For FAULT_FLAG_UNSHARE, however, the PTE may not change. But the additional reference should make it work.
I think it works as intended. It would be clearer if we'd also recheck in wp_page_copy() whether we still don't have an exclusive anon page under PT lock -- and if we would, back off.
To make things more convoluted, what happens if the second thread does an mprotect() to make the page RO after its write fault was handled? I think mprotect() will serialize on the mmap write lock so this is safe too?
Yes, mprotect() synchronizes that. There are other mechanisms to write-protect a page, though, under mmap lock in read mode (uffd-wp). So it's a valid concern.
In all of these cases, reuse should be prevented due to the additional reference on the folio when entering wp_page_copy() right from the start, not turning the page exclusive but instead replacing it by a copy. An additional sanity check sounds like the right thing to do.
Sorry if this is a bit rambly, just trying to make sure I've understood everything correctly.
It's a very interesting corner case, thanks for bringing that up. I think the old mapcount based approach could have suffered from this theoretical issue, but I might be wrong.
-- Thanks, David / dhildenb