Re: [RFC v2 0/3] Turn iomap_page_ops into iomap_folio_ops

Andreas Gruenbacher <agruenba@xxxxxxxxxx> · Fri, 2 Dec 2022 02:54:00 +0100

On Thu, Dec 1, 2022 at 10:30 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Thu, Dec 01, 2022 at 07:09:54PM +0100, Andreas Gruenbacher wrote:
> > Hi again,
> >
> > [Same thing, but with the patches split correctly this time.]
> >
> > we're seeing a race between journaled data writes and the shrinker on
> > gfs2.  What's happening is that gfs2_iomap_page_done() is called after
> > the page has been unlocked, so try_to_free_buffers() can come in and
> > free the buffers while gfs2_iomap_page_done() is trying to add them to
> > the transaction.  Not good.
> >
> > This is a proposal to change iomap_page_ops so that page_prepare()
> > prepares the write and grabs the locked page, and page_done() unlocks
> > and puts that page again.  While at it, this also converts the hooks
> > from pages to folios.
> >
> > To move the pagecache_isize_extended() call in iomap_write_end() out of
> > the way, a new folio_may_straddle_isize() helper is introduced that
> > takes a locked folio.  That is then used when the inode size is updated,
> > before the folio is unlocked.
> >
> > I've also converted the other applicable folio_may_straddle_isize()
> > users, namely generic_write_end(), ext4_write_end(), and
> > ext4_journalled_write_end().
> >
> > Any thoughts?
>
> I doubt that moving page cache operations from the iomap core to
> filesystem specific callouts will be acceptible. I recently proposed
> patches that added page cache walking to an XFS iomap callout to fix
> a data corruption, but they were NAKd on the basis that iomap is
> supposed to completely abstract away the folio and page cache
> manipulations from the filesystem.

Right. The resulting code is really quite disgusting, for a
fundamentalist dream of abstraction.

> This patchset seems to be doing the same thing - moving page cache
> and folio management directly in filesystem specific callouts. Hence
> I'm going to assume that the same architectural demarcation is
> going to apply here, too...
>
> FYI, there is already significant change committed to the iomap
> write path in the current XFS tree as a result of the changes I
> mention - there is stale IOMAP detection which adds a new page ops
> method and adds new error paths with a locked folio in
> iomap_write_begin().

That would have belonged on the iomap-for-next branch rather than in
the middle of a bunch of xfs commits.

> And this other data corruption (and performance) fix for handling
> zeroing over unwritten extents properly:
>
> https://lore.kernel.org/linux-xfs/20221201005214.3836105-1-david@xxxxxxxxxxxxx/
>
> changes the way folios are looked up and instantiated in the page
> cache in iomap_write_begin(). It also adds new error conditions that
> need to be returned to callers so to implement conditional "folio
> must be present and dirty" page cache zeroing from
> iomap_zero_iter(). Those semantics would also have to be supported
> by gfs2, and that greatly complicates modifying and testing iomap
> core changes.
>
> To avoid all this, can we simple move the ->page_done() callout in
> the error path and iomap_write_end() to before we unlock the folio?
> You've already done that for pagecache_isize_extended(), and I can't
> see anything obvious in the gfs2 ->page_done callout that
> would cause issues if it is called with a locked dirty folio...

Yes, I guess we can do that once pagecache_isize_extended() is
replaced by folio_may_straddle_isize().

Can people please scrutinize the math in folio_may_straddle_isize() in
particular?

Thanks,
Andreas

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>