Am Mi., 4. Jan. 2023 um 19:00 Uhr schrieb Darrick J. Wong <djwong@xxxxxxxxxx>: > On Sat, Dec 31, 2022 at 04:09:17PM +0100, Andreas Gruenbacher wrote: > > Eliminate the ->iomap_valid() handler by switching to a ->get_folio() > > handler and validating the mapping there. > > > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx> > > --- > > fs/iomap/buffered-io.c | 25 +++++-------------------- > > fs/xfs/xfs_iomap.c | 37 ++++++++++++++++++++++++++----------- > > include/linux/iomap.h | 22 +++++----------------- > > 3 files changed, 36 insertions(+), 48 deletions(-) > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > > index 4f363d42dbaf..df6fca11f18c 100644 > > --- a/fs/iomap/buffered-io.c > > +++ b/fs/iomap/buffered-io.c > > @@ -630,7 +630,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, > > const struct iomap_page_ops *page_ops = iter->iomap.page_ops; > > const struct iomap *srcmap = iomap_iter_srcmap(iter); > > struct folio *folio; > > - int status = 0; > > + int status; > > > > BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length); > > if (srcmap != &iter->iomap) > > @@ -646,27 +646,12 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, > > folio = page_ops->get_folio(iter, pos, len); > > else > > folio = iomap_get_folio(iter, pos); > > - if (IS_ERR(folio)) > > - return PTR_ERR(folio); > > - > > - /* > > - * Now we have a locked folio, before we do anything with it we need to > > - * check that the iomap we have cached is not stale. The inode extent > > - * mapping can change due to concurrent IO in flight (e.g. > > - * IOMAP_UNWRITTEN state can change and memory reclaim could have > > - * reclaimed a previously partially written page at this index after IO > > - * completion before this write reaches this file offset) and hence we > > - * could do the wrong thing here (zero a page range incorrectly or fail > > - * to zero) and corrupt data. > > - */ > > - if (page_ops && page_ops->iomap_valid) { > > - bool iomap_valid = page_ops->iomap_valid(iter->inode, > > - &iter->iomap); > > - if (!iomap_valid) { > > + if (IS_ERR(folio)) { > > + if (folio == ERR_PTR(-ESTALE)) { > > iter->iomap.flags |= IOMAP_F_STALE; > > - status = 0; > > - goto out_unlock; > > + return 0; > > } > > + return PTR_ERR(folio); > > I wonder if this should be reworked a bit to reduce indenting: > > if (PTR_ERR(folio) == -ESTALE) { > iter->iomap.flags |= IOMAP_F_STALE; > return 0; > } > if (IS_ERR(folio)) > return PTR_ERR(folio); > > But I don't have any strong opinions about that. This is a relatively hot path; the compiler would have to convert the flattened version back to the nested version for the best possible result to avoid a redundant check. Not sure it would always do that. > > } > > > > if (pos + len > folio_pos(folio) + folio_size(folio)) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > > index 669c1bc5c3a7..d0bf99539180 100644 > > --- a/fs/xfs/xfs_iomap.c > > +++ b/fs/xfs/xfs_iomap.c > > @@ -62,29 +62,44 @@ xfs_iomap_inode_sequence( > > return cookie | READ_ONCE(ip->i_df.if_seq); > > } > > > > -/* > > - * Check that the iomap passed to us is still valid for the given offset and > > - * length. > > - */ > > -static bool > > -xfs_iomap_valid( > > - struct inode *inode, > > - const struct iomap *iomap) > > +static struct folio * > > +xfs_get_folio( > > + struct iomap_iter *iter, > > + loff_t pos, > > + unsigned len) > > { > > + struct inode *inode = iter->inode; > > + struct iomap *iomap = &iter->iomap; > > struct xfs_inode *ip = XFS_I(inode); > > + struct folio *folio; > > > > + folio = iomap_get_folio(iter, pos); > > + if (IS_ERR(folio)) > > + return folio; > > + > > + /* > > + * Now that we have a locked folio, we need to check that the iomap we > > + * have cached is not stale. The inode extent mapping can change due to > > + * concurrent IO in flight (e.g., IOMAP_UNWRITTEN state can change and > > + * memory reclaim could have reclaimed a previously partially written > > + * page at this index after IO completion before this write reaches > > + * this file offset) and hence we could do the wrong thing here (zero a > > + * page range incorrectly or fail to zero) and corrupt data. > > + */ > > if (iomap->validity_cookie != > > xfs_iomap_inode_sequence(ip, iomap->flags)) { > > trace_xfs_iomap_invalid(ip, iomap); > > - return false; > > + folio_unlock(folio); > > + folio_put(folio); > > + return ERR_PTR(-ESTALE); > > } > > > > XFS_ERRORTAG_DELAY(ip->i_mount, XFS_ERRTAG_WRITE_DELAY_MS); > > - return true; > > + return folio; > > } > > > > const struct iomap_page_ops xfs_iomap_page_ops = { > > - .iomap_valid = xfs_iomap_valid, > > + .get_folio = xfs_get_folio, > > }; > > > > int > > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > > index dd3575ada5d1..6f8e3321e475 100644 > > --- a/include/linux/iomap.h > > +++ b/include/linux/iomap.h > > @@ -134,29 +134,17 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap) > > * When get_folio succeeds, put_folio will always be called to do any > > * cleanup work necessary. put_folio is responsible for unlocking and putting > > * @folio. > > + * > > + * When an iomap is created, the filesystem can store internal state (e.g., a > > + * sequence number) in iomap->validity_cookie. When it then detects in the > > I would reword this to: > > "When an iomap is created, the filesystem can store internal state (e.g. > sequence number) in iomap->validity_cookie. The get_folio handler can > use this validity cookie to detect if the iomap is no longer up to date > and needs to be refreshed. If this is the case, the function should > return ERR_PTR(-ESTALE) to retry the operation with a fresh mapping." Yes, but "e.g." is always followed by a comma and it's "when", not "if" here. So how about: * When an iomap is created, the filesystem can store internal state (e.g., a * sequence number) in iomap->validity_cookie. The get_folio handler can use * this validity cookie to detect when the iomap needs to be refreshed because * it is no longer up to date. In that case, the function should return * ERR_PTR(-ESTALE) to retry the operation with a fresh mapping. I've incorporated all your feedback into this branch for now: https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/log/?h=iomap-race Thank you, Andreas > --D > > > + * get_folio handler that the iomap is no longer up to date and needs to be > > + * refreshed, it can return ERR_PTR(-ESTALE) to trigger a retry. > > */ > > struct iomap_page_ops { > > struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos, > > unsigned len); > > void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied, > > struct folio *folio); > > - > > - /* > > - * Check that the cached iomap still maps correctly to the filesystem's > > - * internal extent map. FS internal extent maps can change while iomap > > - * is iterating a cached iomap, so this hook allows iomap to detect that > > - * the iomap needs to be refreshed during a long running write > > - * operation. > > - * > > - * The filesystem can store internal state (e.g. a sequence number) in > > - * iomap->validity_cookie when the iomap is first mapped to be able to > > - * detect changes between mapping time and whenever .iomap_valid() is > > - * called. > > - * > > - * This is called with the folio over the specified file position held > > - * locked by the iomap code. > > - */ > > - bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap); > > }; > > > > /* > > -- > > 2.38.1 > >