On Thu, May 04, 2023 at 10:27:50PM +0100, Lorenzo Stoakes wrote: > Writing to file-backed mappings which require folio dirty tracking using > GUP is a fundamentally broken operation, as kernel write access to GUP > mappings do not adhere to the semantics expected by a file system. > > A GUP caller uses the direct mapping to access the folio, which does not > cause write notify to trigger, nor does it enforce that the caller marks > the folio dirty. Okay, problem is clear and the patchset look good to me. But I'm worried breaking existing users. Do we expect the change to be visible to real world users? If yes, are we okay to break them? One thing that came to mind is KVM with "qemu -object memory-backend-file,share=on..." It is mostly used for pmem emulation. Do we have plan B? Just a random/crazy/broken idea: - Allow folio_mkclean() (and folio_clear_dirty_for_io()) to fail, indicating that the page cannot be cleared because it is pinned; - Introduce a new vm_operations_struct::mkclean() that would be called by page_vma_mkclean_one() before clearing the range and can fail; - On GUP, create an in-kernel fake VMA that represents the file, but with custom vm_ops. The VMA registered in rmap to get notified on folio_mkclean() and fail it because of GUP. - folio_clear_dirty_for_io() callers will handle the new failure as indication that the page can be written back but will stay dirty and fs-specific data that is associated with the page writeback cannot be freed. I'm sure the idea is broken on many levels (I have never looked closely at the writeback path). But maybe it is good enough as conversation started? -- Kiryl Shutsemau / Kirill A. Shutemov