On Tue, Jan 23, 2024 at 05:20:55PM +0000, Matthew Wilcox wrote: > We're currently trying to remove page->mapping from the entire kernel. > This has me interested in fb_defio and since I made such a mess of it > with commits ccf953d8f3d6 / 0b78f8bcf495, I'd like to discuss what to > do before diving in. > > Folios continue to have a mapping. So we can effectively do > page_folio(page)->mapping (today that's calling compound_head() to get > to the head page; eventually it's a separate allocation). > > But now I look at commit 56c134f7f1b5, I'm a little scared. > Apparently pages are being allocated from shmem and being mapped by > fb_deferred_io_fault()? This line: > > page->mapping = vmf->vma->vm_file->f_mapping; > > initially appears harmless for shmem files (because that's presumably > a noop), but it's only a noop for head pages. If shmem allocates a > compound page (ie a 2MB THP today), you'll overlay some information > stored in the second and third pages; looks like _entire_mapcount > and _deferred_list.prev (but we do shift those fields around without > regard to what the fb_defio driver is doing). Even if you've disabled > THP today, setting page->mapping to NULL in fb_deferred_io_lastclose() > for a shmem page is a really bad idea. > > I'd like to avoid fb_defio playing with page->mapping at all. > As I understand it, the only reason to set page->mapping is so that > page_mkclean() works. But there are all kinds of assumptions in > page_mkclean() (now essentially folio_mkclean()) that we're dealing with > file-backed or anonymous memory. I wonder if we might be better off > calling pfn_mkclean_range() for each VMA which maps these allocations? > You'd have to keep track of each VMA yourself (I think?) but it would > avoid messing with page->mapping. > > Anyway, I don't know enough about DRM. There might be something > unutterably obvious we could do to fix this. It's just really old code that's been barely touched to keep it going. The issue is that the entire defio stuff is pretty bad layering violation. Not sure what the cleanest way to do this really would be if it only touches the ptes and nothing else. Not sure what's the right function for a bit of pte walking for that. That would still potentially mess with the mapping by the gpu memory allocator in bad ways, but I think at least for all current ones it should avoid problems. Definitely agree that messing with struct page in any way is really bad, we simply didn't get that far yet. I think the cleanest way would be if we have a fb_mmap only for drm drivers in drm_fbdev_generic.c and sunset fb_deferred_io_mmap and use that to just replicate the ptes from the kernel's vmap into one that is ok for userspace. The fbdev implementation in drm_fbdev_generic.c is the only one left in drm that supports fb_defio, so that would catch all of them. To my knowledge all the other defio implementations in native fbdev drivers aren't problematic since none use shmem. For I while we pondered with proxying the vma to the driver's drm native mmap implementation, but that gets real messy plus there's no benefit because fbdev assumes a bit too much that the memory is permanently pinned. So we need the pinned kernel vmap anyway. Cheers, Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch