Re: [PATCH 2/3] mm: provide mapping_wrprotect_page() function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 03, 2025 at 04:49:34PM +0100, Simona Vetter wrote:
> On Fri, Jan 31, 2025 at 06:28:57PM +0000, Lorenzo Stoakes wrote:
> > in the fb_defio video driver, page dirty state is used to determine when
> > frame buffer pages have been changed, allowing for batched, deferred I/O to
> > be performed for efficiency.
> >
> > This implementation had only one means of doing so effectively - the use of
> > the folio_mkclean() function.
> >
> > However, this use of the function is inappropriate, as the fb_defio
> > implementation allocates kernel memory to back the framebuffer, and then is
> > forced to specified page->index, mapping fields in order to permit the
> > folio_mkclean() rmap traversal to proceed correctly.
> >
> > It is not correct to specify these fields on kernel-allocated memory, and
> > moreover since these are not folios, page->index, mapping are deprecated
> > fields, soon to be removed.
> >
> > We therefore need to provide a means by which we can correctly traverse the
> > reverse mapping and write-protect mappings for a page backing an
> > address_space page cache object at a given offset.
> >
> > This patch provides this - mapping_wrprotect_page() allows for this
> > operation to be performed for a specified address_space, offset and page,
> > without requiring a folio nor, of course, an inappropriate use of
> > page->index, mapping.
> >
> > With this provided, we can subequently adjust the fb_defio implementation
> > to make use of this function and avoid incorrect invocation of
> > folio_mkclean() and more importantly, incorrect manipulation of
> > page->index, mapping fields.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> > ---
> >  include/linux/rmap.h |  3 ++
> >  mm/rmap.c            | 73 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 76 insertions(+)
> >
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 683a04088f3f..0bf5f64884df 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -739,6 +739,9 @@ unsigned long page_address_in_vma(const struct folio *folio,
> >   */
> >  int folio_mkclean(struct folio *);
> >
> > +int mapping_wrprotect_page(struct address_space *mapping, pgoff_t pgoff,
> > +		unsigned long nr_pages, struct page *page);
> > +
> >  int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff,
> >  		      struct vm_area_struct *vma);
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index a2ff20c2eccd..bb5a42d95c48 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1127,6 +1127,79 @@ int folio_mkclean(struct folio *folio)
> >  }
> >  EXPORT_SYMBOL_GPL(folio_mkclean);
> >
> > +struct wrprotect_file_state {
> > +	int cleaned;
> > +	pgoff_t pgoff;
> > +	unsigned long pfn;
> > +	unsigned long nr_pages;
> > +};
> > +
> > +static bool mapping_wrprotect_page_one(struct folio *folio,
> > +		struct vm_area_struct *vma, unsigned long address, void *arg)
> > +{
> > +	struct wrprotect_file_state *state = (struct wrprotect_file_state *)arg;
> > +	struct page_vma_mapped_walk pvmw = {
> > +		.pfn		= state->pfn,
> > +		.nr_pages	= state->nr_pages,
> > +		.pgoff		= state->pgoff,
> > +		.vma		= vma,
> > +		.address	= address,
> > +		.flags		= PVMW_SYNC,
> > +	};
> > +
> > +	state->cleaned += page_vma_mkclean_one(&pvmw);
> > +
> > +	return true;
> > +}
> > +
> > +static void __rmap_walk_file(struct folio *folio, struct address_space *mapping,
> > +			     pgoff_t pgoff_start, unsigned long nr_pages,
> > +			     struct rmap_walk_control *rwc, bool locked);
> > +
> > +/**
> > + * mapping_wrprotect_page() - Write protect all mappings of this page.
> > + *
> > + * @mapping:	The mapping whose reverse mapping should be traversed.
> > + * @pgoff:	The page offset at which @page is mapped within @mapping.
> > + * @nr_pages:	The number of physically contiguous base pages spanned.
> > + * @page:	The page mapped in @mapping at @pgoff.
> > + *
> > + * Traverses the reverse mapping, finding all VMAs which contain a shared
> > + * mapping of the single @page in @mapping at offset @pgoff and write-protecting
> > + * the mappings.
> > + *
> > + * The page does not have to be a folio, but rather can be a kernel allocation
> > + * that is mapped into userland. We therefore do not require that the page maps
> > + * to a folio with a valid mapping or index field, rather these are specified in
> > + * @mapping and @pgoff.
> > + *
> > + * Return: the number of write-protected PTEs, or an error.
> > + */
> > +int mapping_wrprotect_page(struct address_space *mapping, pgoff_t pgoff,
> > +		unsigned long nr_pages, struct page *page)
> > +{
> > +	struct wrprotect_file_state state = {
> > +		.cleaned = 0,
> > +		.pgoff = pgoff,
> > +		.pfn = page_to_pfn(page),
>
> Could we go one step further and entirely drop the struct page? Similar to
> unmap_mapping_range for VM_SPECIAL mappings, except it only updates the
> write protection. The reason is that ideally we'd like fbdev defio to
> entirely get rid of any struct page usage, because with some dma_alloc()
> memory regions there's simply no struct page for them (it's a carveout).
> See e.g.  Sa498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if
> necessary") for some of the pain this has caused.
>
> So entirely struct page less way to write protect a pfn would be best. And
> it doesn't look like you need the page here at all?

In the original version [1] we did indeed take a PFN, so this shouldn't be
a problem to change.

Since we make it possible here to explicitly reference the address_space
object mapping the range, and from that can find all the VMAs that map the
page range [pgoff, pgoff + nr_pages), I don't think we do need to think
about a struct page here at all.

The defio code does seem to have some questionable assumptions in place, or
at least ones I couldn't explain away re: attempting to folio-lock (the
non-folios...), so there'd need to be changes on that side, which I suggest
would probably be best for a follow-up series given this one's urgency.

But I'm more than happy to make this interface work with that by doing
another revision where we export PFN only, I think something like:

int mapping_wrprotect_range(struct address_space *mapping, pgoff_t pgoff,
		unsigned long pfn, unsigned long nr_pages);

Should work?

[1]:https://lore.kernel.org/all/cover.1736352361.git.lorenzo.stoakes@xxxxxxxxxx/

>
> Cheers, Sima

Thanks!

>
>
> > +		.nr_pages = nr_pages,
> > +	};
> > +	struct rmap_walk_control rwc = {
> > +		.arg = (void *)&state,
> > +		.rmap_one = mapping_wrprotect_page_one,
> > +		.invalid_vma = invalid_mkclean_vma,
> > +	};
> > +
> > +	if (!mapping)
> > +		return 0;
> > +
> > +	__rmap_walk_file(/* folio = */NULL, mapping, pgoff, nr_pages, &rwc,
> > +			 /* locked = */false);
> > +
> > +	return state.cleaned;
> > +}
> > +EXPORT_SYMBOL_GPL(mapping_wrprotect_page);
> > +
> >  /**
> >   * pfn_mkclean_range - Cleans the PTEs (including PMDs) mapped with range of
> >   *                     [@pfn, @pfn + @nr_pages) at the specific offset (@pgoff)
> > --
> > 2.48.1
> >
>
> --
> Simona Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux