Re: [PATCH 2/3] mm: provide mapping_wrprotect_page() function

Simona Vetter <simona.vetter@xxxxxxxx> · Mon, 3 Feb 2025 16:49:34 +0100

On Fri, Jan 31, 2025 at 06:28:57PM +0000, Lorenzo Stoakes wrote:
> in the fb_defio video driver, page dirty state is used to determine when
> frame buffer pages have been changed, allowing for batched, deferred I/O to
> be performed for efficiency.
> 
> This implementation had only one means of doing so effectively - the use of
> the folio_mkclean() function.
> 
> However, this use of the function is inappropriate, as the fb_defio
> implementation allocates kernel memory to back the framebuffer, and then is
> forced to specified page->index, mapping fields in order to permit the
> folio_mkclean() rmap traversal to proceed correctly.
> 
> It is not correct to specify these fields on kernel-allocated memory, and
> moreover since these are not folios, page->index, mapping are deprecated
> fields, soon to be removed.
> 
> We therefore need to provide a means by which we can correctly traverse the
> reverse mapping and write-protect mappings for a page backing an
> address_space page cache object at a given offset.
> 
> This patch provides this - mapping_wrprotect_page() allows for this
> operation to be performed for a specified address_space, offset and page,
> without requiring a folio nor, of course, an inappropriate use of
> page->index, mapping.
> 
> With this provided, we can subequently adjust the fb_defio implementation
> to make use of this function and avoid incorrect invocation of
> folio_mkclean() and more importantly, incorrect manipulation of
> page->index, mapping fields.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> ---
>  include/linux/rmap.h |  3 ++
>  mm/rmap.c            | 73 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 76 insertions(+)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 683a04088f3f..0bf5f64884df 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -739,6 +739,9 @@ unsigned long page_address_in_vma(const struct folio *folio,
>   */
>  int folio_mkclean(struct folio *);
>  
> +int mapping_wrprotect_page(struct address_space *mapping, pgoff_t pgoff,
> +		unsigned long nr_pages, struct page *page);
> +
>  int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff,
>  		      struct vm_area_struct *vma);
>  
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a2ff20c2eccd..bb5a42d95c48 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1127,6 +1127,79 @@ int folio_mkclean(struct folio *folio)
>  }
>  EXPORT_SYMBOL_GPL(folio_mkclean);
>  
> +struct wrprotect_file_state {
> +	int cleaned;
> +	pgoff_t pgoff;
> +	unsigned long pfn;
> +	unsigned long nr_pages;
> +};
> +
> +static bool mapping_wrprotect_page_one(struct folio *folio,
> +		struct vm_area_struct *vma, unsigned long address, void *arg)
> +{
> +	struct wrprotect_file_state *state = (struct wrprotect_file_state *)arg;
> +	struct page_vma_mapped_walk pvmw = {
> +		.pfn		= state->pfn,
> +		.nr_pages	= state->nr_pages,
> +		.pgoff		= state->pgoff,
> +		.vma		= vma,
> +		.address	= address,
> +		.flags		= PVMW_SYNC,
> +	};
> +
> +	state->cleaned += page_vma_mkclean_one(&pvmw);
> +
> +	return true;
> +}
> +
> +static void __rmap_walk_file(struct folio *folio, struct address_space *mapping,
> +			     pgoff_t pgoff_start, unsigned long nr_pages,
> +			     struct rmap_walk_control *rwc, bool locked);
> +
> +/**
> + * mapping_wrprotect_page() - Write protect all mappings of this page.
> + *
> + * @mapping:	The mapping whose reverse mapping should be traversed.
> + * @pgoff:	The page offset at which @page is mapped within @mapping.
> + * @nr_pages:	The number of physically contiguous base pages spanned.
> + * @page:	The page mapped in @mapping at @pgoff.
> + *
> + * Traverses the reverse mapping, finding all VMAs which contain a shared
> + * mapping of the single @page in @mapping at offset @pgoff and write-protecting
> + * the mappings.
> + *
> + * The page does not have to be a folio, but rather can be a kernel allocation
> + * that is mapped into userland. We therefore do not require that the page maps
> + * to a folio with a valid mapping or index field, rather these are specified in
> + * @mapping and @pgoff.
> + *
> + * Return: the number of write-protected PTEs, or an error.
> + */
> +int mapping_wrprotect_page(struct address_space *mapping, pgoff_t pgoff,
> +		unsigned long nr_pages, struct page *page)
> +{
> +	struct wrprotect_file_state state = {
> +		.cleaned = 0,
> +		.pgoff = pgoff,
> +		.pfn = page_to_pfn(page),

Could we go one step further and entirely drop the struct page? Similar to
unmap_mapping_range for VM_SPECIAL mappings, except it only updates the
write protection. The reason is that ideally we'd like fbdev defio to
entirely get rid of any struct page usage, because with some dma_alloc()
memory regions there's simply no struct page for them (it's a carveout).
See e.g.  Sa498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if
necessary") for some of the pain this has caused.

So entirely struct page less way to write protect a pfn would be best. And
it doesn't look like you need the page here at all?

Cheers, Sima

> +		.nr_pages = nr_pages,
> +	};
> +	struct rmap_walk_control rwc = {
> +		.arg = (void *)&state,
> +		.rmap_one = mapping_wrprotect_page_one,
> +		.invalid_vma = invalid_mkclean_vma,
> +	};
> +
> +	if (!mapping)
> +		return 0;
> +
> +	__rmap_walk_file(/* folio = */NULL, mapping, pgoff, nr_pages, &rwc,
> +			 /* locked = */false);
> +
> +	return state.cleaned;
> +}
> +EXPORT_SYMBOL_GPL(mapping_wrprotect_page);
> +
>  /**
>   * pfn_mkclean_range - Cleans the PTEs (including PMDs) mapped with range of
>   *                     [@pfn, @pfn + @nr_pages) at the specific offset (@pgoff)
> -- 
> 2.48.1
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch