Re: [RFC v6 08/10] iomap/xfs: Eliminate the iomap_valid handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Mo., 9. Jan. 2023 um 23:58 Uhr schrieb Dave Chinner <david@xxxxxxxxxxxxx>:
> On Mon, Jan 09, 2023 at 07:45:27PM +0100, Andreas Gruenbacher wrote:
> > On Sun, Jan 8, 2023 at 10:59 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > > On Sun, Jan 08, 2023 at 08:40:32PM +0100, Andreas Gruenbacher wrote:
> > > > Eliminate the ->iomap_valid() handler by switching to a ->get_folio()
> > > > handler and validating the mapping there.
> > > >
> > > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx>
> > >
> > > I think this is wrong.
> > >
> > > The ->iomap_valid() function handles a fundamental architectural
> > > issue with cached iomaps: the iomap can become stale at any time
> > > whilst it is in use by the iomap core code.
> > >
> > > The current problem it solves in the iomap_write_begin() path has to
> > > do with writeback and memory reclaim races over unwritten extents,
> > > but the general case is that we must be able to check the iomap
> > > at any point in time to assess it's validity.
> > >
> > > Indeed, we also have this same "iomap valid check" functionality in the
> > > writeback code as cached iomaps can become stale due to racing
> > > writeback, truncated, etc. But you wouldn't know it by looking at the iomap
> > > writeback code - this is currently hidden by XFS by embedding
> > > the checks into the iomap writeback ->map_blocks function.
> > >
> > > That is, the first thing that xfs_map_blocks() does is check if the
> > > cached iomap is valid, and if it is valid it returns immediately and
> > > the iomap writeback code uses it without question.
> > >
> > > The reason that this is embedded like this is that the iomap did not
> > > have a validity cookie field in it, and so the validity information
> > > was wrapped around the outside of the iomap_writepage_ctx and the
> > > filesystem has to decode it from that private wrapping structure.
> > >
> > > However, the validity information iin the structure wrapper is
> > > indentical to the iomap validity cookie,
> >
> > Then could that part of the xfs code be converted to use
> > iomap->validity_cookie so that struct iomap_writepage_ctx can be
> > eliminated?
>
> Yes, that is the plan.
>
> >
> > > and so the direction I've
> > > been working towards is to replace this implicit, hidden cached
> > > iomap validity check with an explicit ->iomap_valid call and then
> > > only call ->map_blocks if the validity check fails (or is not
> > > implemented).
> > >
> > > I want to use the same code for all the iomap validity checks in all
> > > the iomap core code - this is an iomap issue, the conditions where
> > > we need to check for iomap validity are different for depending on
> > > the iomap context being run, and the checks are not necessarily
> > > dependent on first having locked a folio.
> > >
> > > Yes, the validity cookie needs to be decoded by the filesystem, but
> > > that does not dictate where the validity checking needs to be done
> > > by the iomap core.
> > >
> > > Hence I think removing ->iomap_valid is a big step backwards for the
> > > iomap core code - the iomap core needs to be able to formally verify
> > > the iomap is valid at any point in time, not just at the point in
> > > time a folio in the page cache has been locked...
> >
> > We don't need to validate an iomap "at any time". It's two specific
> > places in the code in which we need to check, and we're not going to
> > end up with ten more such places tomorrow.
>
> Not immediately, but that doesn't change the fact this is not a
> filesystem specific issue - it's an inherent characteristic of
> cached iomaps and unsynchronised extent state changes that occur
> outside exclusive inode->i_rwsem IO context (e.g. in writeback and
> IO completion contexts).
>
> Racing mmap + buffered writes can expose these state changes as the
> iomap bufferred write IO path is not serialised against the iomap
> mmap IO path except via folio locks. Hence a mmap page fault can
> invalidate a cached buffered write iomap by causing a hole ->
> unwritten, hole -> delalloc or hole -> written conversion in the
> middle of the buffered write range. The buffered write still has a
> hole mapping cached for that entire range, and it is now incorrect.
>
> If the mmap write happens to change extent state at the trailing
> edge of a partial buffered write, data corruption will occur if we
> race just right with writeback and memory reclaim. I'm pretty sure
> that this corruption can be reporduced on gfs2 if we try hard enough
> - generic/346 triggers the mmap/write race condition, all that is
> needed from that point is for writeback and reclaiming pages at
> exactly the right time...
>
> > I'd prefer to keep those
> > filesystem internals in the filesystem specific code instead of
> > exposing them to the iomap layer. But that's just me ...
>
> My point is that there is nothing XFS specific about these stale
> cached iomap race conditions, nor is it specifically related to
> folio locking. The folio locking inversions w.r.t. iomap caching and
> the interactions with writeback and reclaim are simply the
> manifestation that brought the issue to our attention.
>
> This is why I think hiding iomap validation filesystem specific page
> cache allocation/lookup functions is entirely the wrong layer to be
> doing iomap validity checks. Especially as it prevents us from
> adding more validity checks in the core infrastructure when we need
> them in future.
>
> AFAIC, an iomap must carry with it a method for checking
> that it is still valid. We need it in the write path, we need it in
> the writeback path. If we want to relax the restrictions on clone
> operations (e.g. shared locking on the source file), we'll need to
> be able to detect stale cached iomaps in those paths, too. And I
> haven't really thought through all the implications of shared
> locking on buffered writes yet, but that may well require more
> checks in other places as well.
>
> > If we ignore this particular commit for now, do you have any
> > objections to the patches in this series? If not, it would be great if
> > we could add the other patches to iomap-for-next.
>
> I still don't like moving page cache operations into individual
> filesystems, but for the moment I can live with the IOMAP_NOCREATE
> hack to drill iomap state through the filesystem without the
> filesystem being aware of it.

Alright, works for me. Darrick?

> > By the way, I'm still not sure if gfs2 is affected by this whole iomap
> > validation drama given that it neither implements unwritten extents
> > nor delayed allocation. This is a mess.
>
> See above - I'm pretty sure it will be, but it may be very difficult
> to expose. After all, it's taken several years before anyone noticed
> this issue with XFS, even though we were aware of the issue of stale
> cached iomaps causing data corruption in the writeback path....

Okay, that's all pretty ugly. Thanks a lot for the detailed explanation.

Cheers,
Andreas

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux