On Thu, Dec 1, 2022 at 10:30 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Thu, Dec 01, 2022 at 07:09:54PM +0100, Andreas Gruenbacher wrote: > > Hi again, > > > > [Same thing, but with the patches split correctly this time.] > > > > we're seeing a race between journaled data writes and the shrinker on > > gfs2. What's happening is that gfs2_iomap_page_done() is called after > > the page has been unlocked, so try_to_free_buffers() can come in and > > free the buffers while gfs2_iomap_page_done() is trying to add them to > > the transaction. Not good. > > > > This is a proposal to change iomap_page_ops so that page_prepare() > > prepares the write and grabs the locked page, and page_done() unlocks > > and puts that page again. While at it, this also converts the hooks > > from pages to folios. > > > > To move the pagecache_isize_extended() call in iomap_write_end() out of > > the way, a new folio_may_straddle_isize() helper is introduced that > > takes a locked folio. That is then used when the inode size is updated, > > before the folio is unlocked. > > > > I've also converted the other applicable folio_may_straddle_isize() > > users, namely generic_write_end(), ext4_write_end(), and > > ext4_journalled_write_end(). > > > > Any thoughts? > > I doubt that moving page cache operations from the iomap core to > filesystem specific callouts will be acceptible. I recently proposed > patches that added page cache walking to an XFS iomap callout to fix > a data corruption, but they were NAKd on the basis that iomap is > supposed to completely abstract away the folio and page cache > manipulations from the filesystem. Right. The resulting code is really quite disgusting, for a fundamentalist dream of abstraction. > This patchset seems to be doing the same thing - moving page cache > and folio management directly in filesystem specific callouts. Hence > I'm going to assume that the same architectural demarcation is > going to apply here, too... > > FYI, there is already significant change committed to the iomap > write path in the current XFS tree as a result of the changes I > mention - there is stale IOMAP detection which adds a new page ops > method and adds new error paths with a locked folio in > iomap_write_begin(). That would have belonged on the iomap-for-next branch rather than in the middle of a bunch of xfs commits. > And this other data corruption (and performance) fix for handling > zeroing over unwritten extents properly: > > https://lore.kernel.org/linux-xfs/20221201005214.3836105-1-david@xxxxxxxxxxxxx/ > > changes the way folios are looked up and instantiated in the page > cache in iomap_write_begin(). It also adds new error conditions that > need to be returned to callers so to implement conditional "folio > must be present and dirty" page cache zeroing from > iomap_zero_iter(). Those semantics would also have to be supported > by gfs2, and that greatly complicates modifying and testing iomap > core changes. > > To avoid all this, can we simple move the ->page_done() callout in > the error path and iomap_write_end() to before we unlock the folio? > You've already done that for pagecache_isize_extended(), and I can't > see anything obvious in the gfs2 ->page_done callout that > would cause issues if it is called with a locked dirty folio... Yes, I guess we can do that once pagecache_isize_extended() is replaced by folio_may_straddle_isize(). Can people please scrutinize the math in folio_may_straddle_isize() in particular? Thanks, Andreas > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >