On Mon, Feb 03, 2025 at 04:30:04PM +0000, Lorenzo Stoakes wrote: > On Mon, Feb 03, 2025 at 04:49:34PM +0100, Simona Vetter wrote: > > On Fri, Jan 31, 2025 at 06:28:57PM +0000, Lorenzo Stoakes wrote: > > > in the fb_defio video driver, page dirty state is used to determine when > > > frame buffer pages have been changed, allowing for batched, deferred I/O to > > > be performed for efficiency. > > > > > > This implementation had only one means of doing so effectively - the use of > > > the folio_mkclean() function. > > > > > > However, this use of the function is inappropriate, as the fb_defio > > > implementation allocates kernel memory to back the framebuffer, and then is > > > forced to specified page->index, mapping fields in order to permit the > > > folio_mkclean() rmap traversal to proceed correctly. > > > > > > It is not correct to specify these fields on kernel-allocated memory, and > > > moreover since these are not folios, page->index, mapping are deprecated > > > fields, soon to be removed. > > > > > > We therefore need to provide a means by which we can correctly traverse the > > > reverse mapping and write-protect mappings for a page backing an > > > address_space page cache object at a given offset. > > > > > > This patch provides this - mapping_wrprotect_page() allows for this > > > operation to be performed for a specified address_space, offset and page, > > > without requiring a folio nor, of course, an inappropriate use of > > > page->index, mapping. > > > > > > With this provided, we can subequently adjust the fb_defio implementation > > > to make use of this function and avoid incorrect invocation of > > > folio_mkclean() and more importantly, incorrect manipulation of > > > page->index, mapping fields. > > > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > > > --- > > > include/linux/rmap.h | 3 ++ > > > mm/rmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++++ > > > 2 files changed, 76 insertions(+) > > > > > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > > > index 683a04088f3f..0bf5f64884df 100644 > > > --- a/include/linux/rmap.h > > > +++ b/include/linux/rmap.h > > > @@ -739,6 +739,9 @@ unsigned long page_address_in_vma(const struct folio *folio, > > > */ > > > int folio_mkclean(struct folio *); > > > > > > +int mapping_wrprotect_page(struct address_space *mapping, pgoff_t pgoff, > > > + unsigned long nr_pages, struct page *page); > > > + > > > int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, > > > struct vm_area_struct *vma); > > > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > index a2ff20c2eccd..bb5a42d95c48 100644 > > > --- a/mm/rmap.c > > > +++ b/mm/rmap.c > > > @@ -1127,6 +1127,79 @@ int folio_mkclean(struct folio *folio) > > > } > > > EXPORT_SYMBOL_GPL(folio_mkclean); > > > > > > +struct wrprotect_file_state { > > > + int cleaned; > > > + pgoff_t pgoff; > > > + unsigned long pfn; > > > + unsigned long nr_pages; > > > +}; > > > + > > > +static bool mapping_wrprotect_page_one(struct folio *folio, > > > + struct vm_area_struct *vma, unsigned long address, void *arg) > > > +{ > > > + struct wrprotect_file_state *state = (struct wrprotect_file_state *)arg; > > > + struct page_vma_mapped_walk pvmw = { > > > + .pfn = state->pfn, > > > + .nr_pages = state->nr_pages, > > > + .pgoff = state->pgoff, > > > + .vma = vma, > > > + .address = address, > > > + .flags = PVMW_SYNC, > > > + }; > > > + > > > + state->cleaned += page_vma_mkclean_one(&pvmw); > > > + > > > + return true; > > > +} > > > + > > > +static void __rmap_walk_file(struct folio *folio, struct address_space *mapping, > > > + pgoff_t pgoff_start, unsigned long nr_pages, > > > + struct rmap_walk_control *rwc, bool locked); > > > + > > > +/** > > > + * mapping_wrprotect_page() - Write protect all mappings of this page. > > > + * > > > + * @mapping: The mapping whose reverse mapping should be traversed. > > > + * @pgoff: The page offset at which @page is mapped within @mapping. > > > + * @nr_pages: The number of physically contiguous base pages spanned. > > > + * @page: The page mapped in @mapping at @pgoff. > > > + * > > > + * Traverses the reverse mapping, finding all VMAs which contain a shared > > > + * mapping of the single @page in @mapping at offset @pgoff and write-protecting > > > + * the mappings. > > > + * > > > + * The page does not have to be a folio, but rather can be a kernel allocation > > > + * that is mapped into userland. We therefore do not require that the page maps > > > + * to a folio with a valid mapping or index field, rather these are specified in > > > + * @mapping and @pgoff. > > > + * > > > + * Return: the number of write-protected PTEs, or an error. > > > + */ > > > +int mapping_wrprotect_page(struct address_space *mapping, pgoff_t pgoff, > > > + unsigned long nr_pages, struct page *page) > > > +{ > > > + struct wrprotect_file_state state = { > > > + .cleaned = 0, > > > + .pgoff = pgoff, > > > + .pfn = page_to_pfn(page), > > > > Could we go one step further and entirely drop the struct page? Similar to > > unmap_mapping_range for VM_SPECIAL mappings, except it only updates the > > write protection. The reason is that ideally we'd like fbdev defio to > > entirely get rid of any struct page usage, because with some dma_alloc() > > memory regions there's simply no struct page for them (it's a carveout). > > See e.g. Sa498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if > > necessary") for some of the pain this has caused. > > > > So entirely struct page less way to write protect a pfn would be best. And > > it doesn't look like you need the page here at all? > > In the original version [1] we did indeed take a PFN, so this shouldn't be > a problem to change. > > Since we make it possible here to explicitly reference the address_space > object mapping the range, and from that can find all the VMAs that map the > page range [pgoff, pgoff + nr_pages), I don't think we do need to think > about a struct page here at all. > > The defio code does seem to have some questionable assumptions in place, or > at least ones I couldn't explain away re: attempting to folio-lock (the > non-folios...), so there'd need to be changes on that side, which I suggest > would probably be best for a follow-up series given this one's urgency. Yeah there's a bunch more things we need to do to get there. It was the lack of a pfn-based core mm function that stopped us from doing that thus far, plus also fbdev defio being very low priority. But it would definitely avoid a bunch of corner cases and duplication in fbdev emulation code in drivers/gpu/drm. > But I'm more than happy to make this interface work with that by doing > another revision where we export PFN only, I think something like: > > int mapping_wrprotect_range(struct address_space *mapping, pgoff_t pgoff, > unsigned long pfn, unsigned long nr_pages); > > Should work? > > [1]:https://lore.kernel.org/all/cover.1736352361.git.lorenzo.stoakes@xxxxxxxxxx/ Yup that looks like the thing we'll need to wean defio of all that questionable folio/page wrangling. But like you say, should be easy to add/update when we get there. Thanks, Sima > > > > > Cheers, Sima > > Thanks! > > > > > > > > + .nr_pages = nr_pages, > > > + }; > > > + struct rmap_walk_control rwc = { > > > + .arg = (void *)&state, > > > + .rmap_one = mapping_wrprotect_page_one, > > > + .invalid_vma = invalid_mkclean_vma, > > > + }; > > > + > > > + if (!mapping) > > > + return 0; > > > + > > > + __rmap_walk_file(/* folio = */NULL, mapping, pgoff, nr_pages, &rwc, > > > + /* locked = */false); > > > + > > > + return state.cleaned; > > > +} > > > +EXPORT_SYMBOL_GPL(mapping_wrprotect_page); > > > + > > > /** > > > * pfn_mkclean_range - Cleans the PTEs (including PMDs) mapped with range of > > > * [@pfn, @pfn + @nr_pages) at the specific offset (@pgoff) > > > -- > > > 2.48.1 > > > > > > > -- > > Simona Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch