On Fri, Nov 29, 2024 at 02:12:23PM +0100, David Hildenbrand wrote: > On 29.11.24 14:02, Lorenzo Stoakes wrote: > > On Fri, Nov 29, 2024 at 01:59:01PM +0100, David Hildenbrand wrote: > > > On 29.11.24 13:55, Lorenzo Stoakes wrote: > > > > On Fri, Nov 29, 2024 at 01:45:42PM +0100, David Hildenbrand wrote: > > > > > On 29.11.24 13:26, Peter Zijlstra wrote: > > > > > > On Fri, Nov 29, 2024 at 01:12:57PM +0100, David Hildenbrand wrote: > > > > > > > > > > > > > Well, I think we simply will want vm_insert_pages_prot() that stops treating > > > > > > > these things like folios :) . *likely* we'd want a distinct memdesc/type. > > > > > > > > > > > > > > We could start that work right now by making some user (iouring, > > > > > > > ring_buffer) set a new page->_type, and checking that in > > > > > > > vm_insert_pages_prot() + vm_normal_page(). If set, don't touch the refcount > > > > > > > and the mapcount. > > > > > > > > > > > > > > Because then, we can just make all the relevant drivers set the type, refuse > > > > > > > in vm_insert_pages_prot() anything that doesn't have the type set, and > > > > > > > refuse in vm_normal_page() any pages with this memdesc. > > > > > > > > > > > > > > Maybe we'd have to teach CoW to copy from such pages, maybe not. GUP of > > > > > > > these things will stop working, I hope that is not a problem. > > > > > > > > > > > > Well... perf-tool likes to call write() upon these pages in order to > > > > > > write out the data from the mmap() into a file. > > > > > > > > I'm confused about what you mean, write() using the fd should work fine, how > > > > would they interact with the mmap? I mean be making a silly mistake here > > > > > > write() to file from the mmap()'ed address range to *some* file. > > > > > > > Yeah sorry my brain melted down briefly, for some reason was thinking of read() > > writing into the buffer... > > > > > This will GUP the pages you inserted. > > > > > > GUP does not work on PFNMAP. > > > > Well it _does_ if struct page **pages is set to NULL :) > > Hm? :) > > check_vma_flags() unconditionally refuses VM_PFNMAP. Ha, funny with my name all over git blame there... ok yup missed this, the vm_normal_page() == NULL stuff must but for mixed map (and those other weird cases I think you can get0... Well good. Where is write() invoking GUP? I'm kind of surprised it's not just using uaccess? One thing to note is I did run all the perf tests with no issues whatsoever. You would _think_ this would have come up... I'm editing some test code to explicitly write() from the buffer anyway to see. If we can't do pfnmap, and we definitely can't do mixedmap (because it's basically entirely equivalent in every way to just faulting in the pages as before and requires the same hacks) then I will have to go back to the drawing board or somehow change the faulting code. This really sucks. I'm not quite sure I even understand why we don't allow GUP used _just for pinning_ on VM_PFNMAP when it is -in effect- already pinned on assumption whatever mapped it will maintain the lifetime. What a mess... > > -- > Cheers, > > David / dhildenb >