On Mon, Apr 24, 2023 at 07:22:03PM +0100, Lorenzo Stoakes wrote: > OK I guess you mean the folio lock :) Well there is > unpin_user_pages_dirty_lock() and unpin_user_page_range_dirty_lock() and > also set_page_dirty_lock() (used by __access_remote_vm()) which should > avoid this. It has been a while, but IIRC, these are all basically racy, the comment in front of set_page_dirty_lock() even says it is racy.. The race is that a FS cleans a page and thinks it cannot become dirty, and then it becomes dirty - and all variations of that.. Looking around a bit, I suppose what I'd expect to see is a sequence sort of like what do_page_mkwrite() does: /* Synchronize with the FS and get the page locked */ ret = vmf->vma->vm_ops->page_mkwrite(vmf); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) return ret; if (unlikely(!(ret & VM_FAULT_LOCKED))) { lock_page(page); if (!page->mapping) { unlock_page(page); return 0; /* retry */ } ret |= VM_FAULT_LOCKED; } else VM_BUG_ON_PAGE(!PageLocked(page), page); /* Write to the page with the CPU */ va = kmap_local_atomic(page); memcpy(va, ....); kunmap_local_atomic(page); /* Tell the FS and unlock it. */ set_page_dirty(page); unlock_page(page); I don't know if this is is exactly right, but it seems closerish So maybe some kind of GUP interfaces that returns single locked pages is the right direction? IDK Or maybe we just need to make a memcpy primitive that works while holding the PTLs? > We definitely need to keep ptrace and /proc/$pid/mem functioning correctly, > and I given the privilege levels required I don't think there's a security > issue there? Even root is not allowed to trigger data corruption or oops inside the kernel. Jason