From: Dave Chinner <dchinner@xxxxxxxxxx> xfs_da3_split() has to handle all thre versions of the directory/attribute btree structure. The attr tree is v1, the dir tre is v2 or v3. The main difference between the v1 and v2/3 trees is the way tree nodes are split - in the v1 tree we can require a double split to occur because the object to be inserted may be larger than the space made by splitting a leaf. In this case we need to do a double split - one to split the full leaf, then another to allocate an empty leaf block in the correct location for the new entry. This does not happen with dir (v2/v3) formats as the objects being inserted are always guaranteed to fit into the new space in the split blocks. The problem is that when we are doing the first root split, there can be three leaf blocks that need to be updated for an attr tree, but there will ony ever be two blocks for the dir tree. The code, however, expects that there will always be an extra block (i.e. three blocks) that needs updating. When rebuilding broken directories, xfs_repair trips over this condition in the directroy code and SEGVs because state->extrablk.bp == NULL. i.e. there is no extra block. Indeed, for directories they *may* be an extra block on this buffer pointer. However, it's guaranteed not to be a leaf block (i.e. a directory data block) - the directory code only ever places hash index or free space blocks in this pointer (as a cursor of sorts), and so to use it as a directory data block will immediately corrupt the directory. This manifests itself in repair as a directory corruption in a repaired directory, leaving the directory rebuild incomplete. This is a dir v2 zero-day bug - it was in the initial dir v2 commit that was made back in February 1998. Fix this by making the update code aware of whether the extra block is a valid node for linking into the tree, rather than asserting (incorrectly) that the extra block must be valid. This brings the code in line with xfs_da3_node_split() which, from the first dir v2 commit, handled the conditional extra block case correctly.... Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- libxfs/xfs_da_btree.c | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/libxfs/xfs_da_btree.c b/libxfs/xfs_da_btree.c index bf5fe21..58a0361 100644 --- a/libxfs/xfs_da_btree.c +++ b/libxfs/xfs_da_btree.c @@ -356,6 +356,7 @@ xfs_da3_split( int action = 0; int error; int i; + bool use_extra = false; trace_xfs_da_split(state->args); @@ -395,6 +396,7 @@ xfs_da3_split( * Entry wouldn't fit, split the leaf again. */ state->extravalid = 1; + use_extra = true; if (state->inleaf) { state->extraafter = 0; /* before newblk */ trace_xfs_attr_leaf_split_before(state->args); @@ -454,8 +456,13 @@ xfs_da3_split( /* * Update pointers to the node which used to be block 0 and * just got bumped because of the addition of a new root node. - * There might be three blocks involved if a double split occurred, - * and the original block 0 could be at any position in the list. + * There might be three blocks involved if a double split occurred and + * in this case use_extra will be set. This can only happen for attr + * trees, as extrablk for dir trees can only contain data or free space + * blocks, never leaf/node blocks. + * + * Note that the original block 0 could be at any position in the list + * of blocks in the tree. * * Note: the magic numbers and sibling pointers are in the same * physical place for both v2 and v3 headers (by design). Hence it @@ -464,31 +471,37 @@ xfs_da3_split( */ node = oldblk->bp->b_addr; if (node->hdr.info.forw) { + bp = NULL; if (be32_to_cpu(node->hdr.info.forw) == addblk->blkno) { bp = addblk->bp; - } else { + } else if (use_extra) { ASSERT(state->extravalid); bp = state->extrablk.bp; } - node = bp->b_addr; - node->hdr.info.back = cpu_to_be32(oldblk->blkno); - xfs_trans_log_buf(state->args->trans, bp, - XFS_DA_LOGRANGE(node, &node->hdr.info, - sizeof(node->hdr.info))); + if (bp) { + node = bp->b_addr; + node->hdr.info.back = cpu_to_be32(oldblk->blkno); + xfs_trans_log_buf(state->args->trans, bp, + XFS_DA_LOGRANGE(node, &node->hdr.info, + sizeof(node->hdr.info))); + } } node = oldblk->bp->b_addr; if (node->hdr.info.back) { + bp = NULL; if (be32_to_cpu(node->hdr.info.back) == addblk->blkno) { bp = addblk->bp; } else { ASSERT(state->extravalid); bp = state->extrablk.bp; } - node = bp->b_addr; - node->hdr.info.forw = cpu_to_be32(oldblk->blkno); - xfs_trans_log_buf(state->args->trans, bp, - XFS_DA_LOGRANGE(node, &node->hdr.info, - sizeof(node->hdr.info))); + if (bp) { + node = bp->b_addr; + node->hdr.info.forw = cpu_to_be32(oldblk->blkno); + xfs_trans_log_buf(state->args->trans, bp, + XFS_DA_LOGRANGE(node, &node->hdr.info, + sizeof(node->hdr.info))); + } } addblk->bp = NULL; return 0; -- 2.5.0 _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs