On Sun, Jul 02, 2023 at 10:58:47PM -0700, Sidhartha Kumar wrote: > @@ -2947,14 +2947,14 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) > return ret; > if (unlikely(!(ret & VM_FAULT_LOCKED))) { > - lock_page(page); > - if (!page->mapping) { > - unlock_page(page); > + folio_lock(folio); > + if (!folio_mapping(folio)) { > + folio_unlock(folio); I promised to explain this better once I had time, and I have time now. folio->mapping is used for a multitude of purposes, unfortunately. Maybe some future work will reduce that, but for now, These Are The Rules. If the folio is marked as being Slab, it's used for something else. The folio does not belong to an address space (nor can it be mapped, so we're not going to see it here, but sometimes we see it in other contexts where we call folio_mapping()). The bottom two bits are used as PAGE_MAPPING_FLAGS. If they're both 0, this folio belongs to a file, and the rest of folio->mapping is a pointer to a struct address_space. Or they're both 0 because the whole thing is NULL. More on that below. If the bottom two bits are 01b, this is an anonymous folio, and folio->mapping is actually a pointer to an anon_vma (which is not the same thing as an anon vma). If the bottom two bits are 10b, this is a Movable page (anon & file memory is also movable, but this is different). The folio->mapping points to a struct movable_operations. If the bottom two bits are 11b, this is a KSM allocation, and folio->mapping points to a struct ksm_stable_node. When we remove a folio from the page cache, we reset folio->mapping to NULL. We often remove folios from the page cache before their refcount drops to zero (the common case is to look up the folio in the page cache, which grabs a reference, remove the folio from the page cache which decrements the refcount, then put the folio which might be the last refcount). So it's entirely possible to see a folio in this function with a NULL mapping; that means it's been removed from the file through a truncate or siimlar, and we need to fail the mkwrite. Userspace is about to get a segfault. If you find all of that confusing, well, I agree, and I'm trying to simplify it. So, with all that background, what's going on here? Part of the "modern" protocol for handling page faults is to lock the folio in vm_ops->page_mkwrite. But we still support (... why?) drivers that haven't been updated. They return 0 on success instead of VM_FAULT_LOCKED. So we take the lock for them, then check that the folio wasn't truncated, and bail out if it looks like it was. If we have a really old-school driver that has allocated a page, mapped it to userspace, and set page->mapping to be, eg, Movable, by calling folio_mapping() instead of folio->mapping, we'll end up seeing NULL instead of a non-NULL value, mistakenly believe it to have been truncated and enter an endless loop. Am I being paranoid here? Maybe! Drivers should have been updated by now. The "modern" way was introduced in 2007 (commit d0217ac04ca6), so it'd be nice to turn this into a WARN_ON_ONCE so drivers fix their code. There are only ~30 implementations of page_mkwrite in the kernel, so it might not take too long to check everything's OK.