On Tue, Sep 23, 2014 at 03:28:58PM -0400, Brian Foster wrote: > xfs_bmap_del_extent() handles extent removal from the in-core and > on-disk extent lists. When removing a delalloc range, it updates the > indirect block reservation appropriately based on the removal. It > currently enforces that the new indirect block reservation is less than > or equal to the original. This is normally the case in all situations > except for when the removed range creates a hole in a single delalloc > extent, thus splitting a single delalloc extent in two. > > The indirect block reservation is divided evenly between the two new > extents in this scenario. However, it is possible with small enough > extents to split an indlen==1 extent into two such slightly smaller > extents. This leaves one extent with 0 indirect blocks and leads to > assert failures in other areas (e.g., xfs_bunmapi() if the extent > happens to be removed). I had long suspected we had an issue in this code, but was never able to nail down a reproducer that triggered it. Do you have a reproducer, or did you find this by reading/tracing the code? > Refactor xfs_bunmapi() to make the updates that must be consistent > against the inode (e.g., delalloc block counter, quota reservation) > right before the extent is deleted. Move the sb block counter update > after the extent is deleted and update xfs_bmap_del_extent() to steal > some blocks from the freed extent if a larger overall indirect > reservation is required by the extent removal. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > Hi all, > > I'm seeing the following assert more frequently with fsx and the recent > xfs_free_file_space() changes (at least on 1k fsb fs'): > > XFS: Assertion failed: startblockval(del.br_startblock) > 0, file: fs/xfs/libxfs/xfs_bmap.c, line: 5281 > > This occurs for the reason described in the commit log description. This > is a quick experiment I wanted to test to verify the problem goes away > (so far, so good). Very lightly tested so far. I suspect it's also the cause of these occasional assert failures that I see: XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 327 during delalloc conversion because there wasn't a space reservation for the blocks allocated (i.e. indlen was zero) and so we overrun the transaction block reservation. > I'm not too fond of changing br_blockcount like this. It seems like a > potential landmine. Alternative approaches could be to kill the assert > if we think indlen==0 extents is not a huge problem in this scenario, A delalloc extent always needs a reservation.... > update the counters independently in xfs_bmap_del_extent() to get the > needed blocks or pass a separate output parameter rather than messing > with br_blockcount (e.g., '*ofreedblks'). The latter might mean we could > just move the entire hunk that updates the inode/quota and whatnot > rather than splitting it up. > > I wanted to put this on the list for comments. Thoughts? Other ideas? > Thanks. > > Brian > > fs/xfs/libxfs/xfs_bmap.c | 64 ++++++++++++++++++++++++++++++------------------ > 1 file changed, 40 insertions(+), 24 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index 86df952..1858e6b 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -4969,6 +4969,11 @@ xfs_bmap_del_extent( > temp2 = xfs_bmap_worst_indlen(ip, temp2); > new.br_startblock = nullstartblock((int)temp2); > da_new = temp + temp2; > + /* pull blocks from extent to make up the difference */ > + while (da_new > da_old && del->br_blockcount) { > + del->br_blockcount--; > + da_new--; > + } > while (da_new > da_old) { > if (temp) { > temp--; So this is the crux of the fix, right? The block stealing from the deleted extent? I have no objections to doing this given that we already munge the indirect reservations inteh loop below, but I think we should make this cleaner and more understandable. i.e. refactor the indlen adjustment code into a helper, and integrate the block stealing into the existing adjustment loop? > @@ -5277,9 +5282,37 @@ xfs_bunmapi( > goto nodelete; > } > } > + > + /* > + * If it's the case where the directory code is running > + * with no block reservation, and the deleted block is in > + * the middle of its extent, and the resulting insert > + * of an extent would cause transformation to btree format, > + * then reject it. The calling code will then swap > + * blocks around instead. > + * We have to do this now, rather than waiting for the > + * conversion to btree format, since the transaction > + * will be dirty. > + */ > + if (!wasdel && xfs_trans_get_block_res(tp) == 0 && > + XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS && > + XFS_IFORK_NEXTENTS(ip, whichfork) >= /* Note the >= */ > + XFS_IFORK_MAXEXT(ip, whichfork) && > + del.br_startoff > got.br_startoff && > + del.br_startoff + del.br_blockcount < > + got.br_startoff + got.br_blockcount) { > + error = -ENOSPC; > + goto error0; > + } > + > + /* > + * Unreserve quota and update realtime free space, if > + * appropriate. If delayed allocation, update the inode delalloc > + * counter now and wait to update the sb counters as > + * xfs_bmap_del_extent() might need to borrow some blocks. > + */ > if (wasdel) { > ASSERT(startblockval(del.br_startblock) > 0); > - /* Update realtime/data freespace, unreserve quota */ > if (isrt) { > xfs_filblks_t rtexts; > Perhaps this should be separated into it's own patch that is applied first? It doesn't seem to be directly related to the accounting fix... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs