On Fri, Feb 05, 2016 at 10:05:04AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > xfs_da3_split() has to handle all three versions of the > directory/attribute btree structure. The attr tree is v1, the dir > tre is v2 or v3. The main difference between the v1 and v2/3 trees > is the way tree nodes are split - in the v1 tree we can require a > double split to occur because the object to be inserted may be > larger than the space made by splitting a leaf. In this case we need > to do a double split - one to split the full leaf, then another to > allocate an empty leaf block in the correct location for the new > entry. This does not happen with dir (v2/v3) formats as the objects > being inserted are always guaranteed to fit into the new space in > the split blocks. > > Indeed, for directories they *may* be an extra block on this buffer > pointer. However, it's guaranteed not to be a leaf block (i.e. a > directory data block) - the directory code only ever places hash > index or free space blocks in this pointer (as a cursor of > sorts), and so to use it as a directory data block will immediately > corrupt the directory. > > The problem is that the code assumes that there may be extra blocks > that we need to link into the tree once we've split the root, but > this is not true for either dir or attr trees, because the extra > attr block is always consumed by the last node split before we split > the root. Hence the linking in an extra block is always wrong at the > root split level, and this manifests itself in repair as a directory > corruption in a repaired directory, leaving the directory rebuild > incomplete. > > This is a dir v2 zero-day bug - it was in the initial dir v2 commit > that was made back in February 1998. > > Fix this by ensuring the linking of the blocks after the root split > never tries to make use of the extra blocks that may be held in the > cursor. They are held there for other purposes and should never be > touched by the root splitting code. > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> > libxfs/xfs_da_btree.c | 59 +++++++++++++++++++++++++-------------------------- > 1 file changed, 29 insertions(+), 30 deletions(-) > > diff --git a/libxfs/xfs_da_btree.c b/libxfs/xfs_da_btree.c > index bf5fe21..25072c7 100644 > --- a/libxfs/xfs_da_btree.c > +++ b/libxfs/xfs_da_btree.c > @@ -351,7 +351,6 @@ xfs_da3_split( > struct xfs_da_state_blk *newblk; > struct xfs_da_state_blk *addblk; > struct xfs_da_intnode *node; > - struct xfs_buf *bp; > int max; > int action = 0; > int error; > @@ -392,7 +391,9 @@ xfs_da3_split( > break; > } > /* > - * Entry wouldn't fit, split the leaf again. > + * Entry wouldn't fit, split the leaf again. The new > + * extrablk will be consumed by xfs_da3_node_split if > + * the node is split. > */ > state->extravalid = 1; > if (state->inleaf) { > @@ -441,6 +442,14 @@ xfs_da3_split( > return 0; > > /* > + * xfs_da3_node_split() should have consumed any extra blocks we added > + * during a double leaf split in the attr fork. This is guaranteed as > + * we can't be here if the attr fork only has a single leaf block. > + */ > + ASSERT(state->extravalid == 0 || > + state->path.blk[max].magic == XFS_DIR2_LEAFN_MAGIC); > + > + /* > * Split the root node. > */ > ASSERT(state->path.active == 0); > @@ -452,43 +461,33 @@ xfs_da3_split( > } > > /* > - * Update pointers to the node which used to be block 0 and > - * just got bumped because of the addition of a new root node. > - * There might be three blocks involved if a double split occurred, > - * and the original block 0 could be at any position in the list. > + * Update pointers to the node which used to be block 0 and just got > + * bumped because of the addition of a new root node. Note that the > + * original block 0 could be at any position in the list of blocks in > + * the tree. > * > - * Note: the magic numbers and sibling pointers are in the same > - * physical place for both v2 and v3 headers (by design). Hence it > - * doesn't matter which version of the xfs_da_intnode structure we use > - * here as the result will be the same using either structure. > + * Note: the magic numbers and sibling pointers are in the same physical > + * place for both v2 and v3 headers (by design). Hence it doesn't matter > + * which version of the xfs_da_intnode structure we use here as the > + * result will be the same using either structure. > */ > node = oldblk->bp->b_addr; > if (node->hdr.info.forw) { > - if (be32_to_cpu(node->hdr.info.forw) == addblk->blkno) { > - bp = addblk->bp; > - } else { > - ASSERT(state->extravalid); > - bp = state->extrablk.bp; > - } > - node = bp->b_addr; > + ASSERT(be32_to_cpu(node->hdr.info.forw) == addblk->blkno); > + node = addblk->bp->b_addr; > node->hdr.info.back = cpu_to_be32(oldblk->blkno); > - xfs_trans_log_buf(state->args->trans, bp, > - XFS_DA_LOGRANGE(node, &node->hdr.info, > - sizeof(node->hdr.info))); > + xfs_trans_log_buf(state->args->trans, addblk->bp, > + XFS_DA_LOGRANGE(node, &node->hdr.info, > + sizeof(node->hdr.info))); > } > node = oldblk->bp->b_addr; > if (node->hdr.info.back) { > - if (be32_to_cpu(node->hdr.info.back) == addblk->blkno) { > - bp = addblk->bp; > - } else { > - ASSERT(state->extravalid); > - bp = state->extrablk.bp; > - } > - node = bp->b_addr; > + ASSERT(be32_to_cpu(node->hdr.info.back) == addblk->blkno); > + node = addblk->bp->b_addr; > node->hdr.info.forw = cpu_to_be32(oldblk->blkno); > - xfs_trans_log_buf(state->args->trans, bp, > - XFS_DA_LOGRANGE(node, &node->hdr.info, > - sizeof(node->hdr.info))); > + xfs_trans_log_buf(state->args->trans, addblk->bp, > + XFS_DA_LOGRANGE(node, &node->hdr.info, > + sizeof(node->hdr.info))); > } > addblk->bp = NULL; > return 0; > -- > 2.5.0 > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs