On Wed, Feb 15, 2017 at 10:09:55AM -0800, Darrick J. Wong wrote: > On Wed, Feb 15, 2017 at 10:05:28AM -0500, Brian Foster wrote: > > The block reservation for the transaction allocated in > > xfs_shift_file_space() is an artifact of the original collapse range > > support. It exists to handle the case where a collapse range occurs, > > the initial extent is left shifted into a location that forms a > > contiguous boundary with the previous extent and thus the extents > > are merged. This code was subsequently refactored and reused for > > insert range (right shift) support. > > > > If an insert range occurs under low free space conditions, the > > extent at the starting offset is split before the first shift > > transaction is allocated. If the block reservation fails, this > > leaves separate, but contiguous extents around in the inode. While > > not a fatal problem, this is unexpected and will flag a warning on > > subsequent insert range operations on the inode. This problem has > > been reproduce intermittently by generic/270 running against a > > ramdisk device. > > > > Since right shift does not create new extent boundaries in the > > inode, a block reservation for extent merge is unnecessary. Update > > xfs_shift_file_space() to conditionally reserve fs blocks for left > > shift transactions only. This avoids the warning reproduced by > > generic/270. > > > > Reported-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > --- > > fs/xfs/xfs_bmap_util.c | 20 ++++++++++---------- > > 1 file changed, 10 insertions(+), 10 deletions(-) > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > > index 7c3bfaf..6be5f26 100644 > > --- a/fs/xfs/xfs_bmap_util.c > > +++ b/fs/xfs/xfs_bmap_util.c > > @@ -1385,10 +1385,16 @@ xfs_shift_file_space( > > xfs_fileoff_t stop_fsb; > > xfs_fileoff_t next_fsb; > > xfs_fileoff_t shift_fsb; > > + uint resblks; > > > > ASSERT(direction == SHIFT_LEFT || direction == SHIFT_RIGHT); > > > > if (direction == SHIFT_LEFT) { > > + /* > > + * Reserve blocks to cover potential extent merges after left > > + * shift operations. > > + */ > > + resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0); > > next_fsb = XFS_B_TO_FSB(mp, offset + len); > > stop_fsb = XFS_B_TO_FSB(mp, VFS_I(ip)->i_size); > > } else { > > @@ -1396,6 +1402,7 @@ xfs_shift_file_space( > > * If right shift, delegate the work of initialization of > > * next_fsb to xfs_bmap_shift_extent as it has ilock held. > > */ > > + resblks = 0; > > Hmmmm. I am convinced that this patch removes the most likely cause of > _trans_alloc failure, and therefore makes the g/270 failures go away. > > However, I worry that if we split the extent and _trans_alloc fails for > some other reason (e.g. ENOMEM) then we'll still end up two adjacent > bmap extents that should be combined. Granted, the only solution that I > can think of is very complicated (create a redo log item, link > everything together with the deferred ops mechanism, thereby making > right shift an atomic operation) for something that's unlikely to > happen(?) during an operation that might not be all that frequent > anyway. I'm also not sure about the implications of adjacent mergeable > bmaps -- I think we can handle it, but it's not like I've researched > this thoroughly. > > <shrug> Thoughts? > Yeah, this isn't a pure fix given the way the code is organized. I think I meant to point that out in the commit log; that technically this state is still possible, but probably not as likely to occur. I also considered killing off the warning, but it still seems useful to me for similar reasons. (Effectively, the motivation for this patch is really just to shut the test up. :). I considered error handling just enough to realize that there wasn't a simple solution. Given that this change seemed correct regardless, I figured this works for now and we can revisit if this remains a problem in practice. Beyond that... an atomic rewrite using the deferred ops stuff seems like a reasonable approach technically, but probably should be more motivated by the broader fact that afaict any of the collapse/insert range operations can fail midway through the overall operation and leave the file in a halfway shifted state with respect to the original request. Would deferred ops address that problem (e.g., what if a subsequent transaction allocation failed in that model, after one or more extents had already been shifted)? Then again, I _thought_ that all came up when the collapse range stuff was originally posted and wasn't considered a major problem to the users (either that or we didn't have a straightforward approach to make the whole thing atomic at the time) because ultimately the operation can be retried or the original state recovered from userspace... Brian > --D > > > next_fsb = NULLFSBLOCK; > > stop_fsb = XFS_B_TO_FSB(mp, offset); > > } > > @@ -1437,21 +1444,14 @@ xfs_shift_file_space( > > } > > > > while (!error && !done) { > > - /* > > - * We would need to reserve permanent block for transaction. > > - * This will come into picture when after shifting extent into > > - * hole we found that adjacent extents can be merged which > > - * may lead to freeing of a block during record update. > > - */ > > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, > > - XFS_DIOSTRAT_SPACE_RES(mp, 0), 0, 0, &tp); > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, > > + &tp); > > if (error) > > break; > > > > xfs_ilock(ip, XFS_ILOCK_EXCL); > > error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot, > > - ip->i_gdquot, ip->i_pdquot, > > - XFS_DIOSTRAT_SPACE_RES(mp, 0), 0, > > + ip->i_gdquot, ip->i_pdquot, resblks, 0, > > XFS_QMOPT_RES_REGBLKS); > > if (error) > > goto out_trans_cancel; > > -- > > 2.7.4 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html