Re: [PATCH] xfs: fix off-by-one-block in xfs_discard_folio()

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 2 Mar 2023 09:12:27 +1100

On Wed, Mar 01, 2023 at 02:41:53PM +0100, Andreas Grünbacher wrote:
> Am Mi., 1. März 2023 um 02:07 Uhr schrieb Dave Chinner <david@xxxxxxxxxxxxx>:
> > On Tue, Feb 28, 2023 at 04:47:01PM -0800, Darrick J. Wong wrote:
> > > On Wed, Mar 01, 2023 at 11:17:06AM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > >
> > > > The recent writeback corruption fixes changed the code in
> > > > xfs_discard_folio() to calculate a byte range to for punching
> > > > delalloc extents. A mistake was made in using round_up(pos) for the
> > > > end offset, because when pos points at the first byte of a block, it
> > > > does not get rounded up to point to the end byte of the block. hence
> > > > the punch range is short, and this leads to unexpected behaviour in
> > > > certain cases in xfs_bmap_punch_delalloc_range.
> > > >
> > > > e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
> > > > there is no previous extent and it rounds up the punch to the end of
> > > > the delalloc extent it found at offset 0, not the end of the range
> > > > given to xfs_bmap_punch_delalloc_range().
> > > >
> > > > Fix this by handling the zero block offset case correctly.
> > > >
> > > > Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
> > > > Reported-by: Pengfei Xu <pengfei.xu@xxxxxxxxx>
> > > > Found-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > ---
> > > >  fs/xfs/xfs_aops.c | 14 ++++++++++++--
> > > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > > index 41734202796f..429f63cfd7d4 100644
> > > > --- a/fs/xfs/xfs_aops.c
> > > > +++ b/fs/xfs/xfs_aops.c
> > > > @@ -466,6 +466,7 @@ xfs_discard_folio(
> > > >  {
> > > >     struct xfs_inode        *ip = XFS_I(folio->mapping->host);
> > > >     struct xfs_mount        *mp = ip->i_mount;
> > > > +   xfs_off_t               end_off;
> > > >     int                     error;
> > > >
> > > >     if (xfs_is_shutdown(mp))
> > > > @@ -475,8 +476,17 @@ xfs_discard_folio(
> > > >             "page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
> > > >                     folio, ip->i_ino, pos);
> > > >
> > > > -   error = xfs_bmap_punch_delalloc_range(ip, pos,
> > > > -                   round_up(pos, folio_size(folio)));
> > > > +   /*
> > > > +    * Need to be careful with the case where the pos passed in points to
> > > > +    * the first byte of the folio - rounding up won't change the value,
> > > > +    * but in all cases here we need to end offset to point to the start
> > > > +    * of the next folio.
> > > > +    */
> > > > +   if (pos == folio_pos(folio))
> > > > +           end_off = pos + folio_size(folio);
> > > > +   else
> > > > +           end_off = round_up(pos, folio_size(folio));
> > >
> > > Can this construct be simplified to:
> > >
> > >       end_off = round_up(pos + 1, folio_size(folio));
> >
> > I thought about that first, but I really, really dislike sprinkling
> > magic "+ 1" corrections into the code to address non-obvious
> > unexplained off-by-one problems.
> >
> >
> > > If pos is the first byte of the folio, it'll round end_off to the start
> > > of the next folio.  If pos is (somehow) the last byte of the folio, the
> > > first argument to round_up is already the first byte of the next folio,
> > > and rounding won't change it.
> >
> > Yup, and that's exactly the problem I had with doing this - it
> > relies on the implicit behaviour that by moving last byte of a block
> > to the first byte of the next block, round_up() won't change the end
> > offset.  i.e. the correct functioning of the code is just as
> > non-obvious with a magic "+ 1" as the incorrect functioning was
> > without it.
> 
> Hmm. On the other hand, it's not immediately obvious that the if
> statement only does an addition with rounding; it might as well do
> something more complex.

I don't really understand what you are trying to say here.

> Darrick's version avoids making things more
> complicated than they need to be.

At the expense having magic "+ 1"s in the code. Those always lead to
confusion and future off-by-one bugs.

Following the fundamental principle of obvious correctness: If we
are adding a "+ 1" to fix a bug, then the original code was bad and
needs to be reworked to remove the possibility of future, difficult
to detect off-by-one mistakes in the code.

> Other ways of doing the same thing would be:
> 
> end_off = round_down(pos, folio_size(folio)) + folio_size(folio);
> end_off = folio_pos(folio) + folio_size(folio);

That last one is the best suggestion I've heard, because the end
offset is always the first byte of the next folio. That's an
explicit, unambiguous encoding of the exact requirement we need to
fulfil. It does not require rounding, magic "+ 1" values to be
added, etc.

Thanks, Andreas, I'll change it to that.

-Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx