[XFS updates] XFS development tree branch, for-linus, updated. v3.7-rc1-10-g6ce377a

xfs@xxxxxxxxxxx · Thu, 8 Nov 2012 11:30:04 -0600

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".

The branch, for-linus has been updated
  6ce377a xfs: fix reading of wrapped log data
  03b1293 xfs: fix buffer shudown reference count mismatch
  4b62acf xfs: don't vmap inode cluster buffers during free
  ca250b1 xfs: invalidate allocbt blocks moved to the free list
  1e7acbb xfs: silence uninitialised f.file warning.
  eaef854 xfs: growfs: don't read garbage for new secondary superblocks
  1f3c785 xfs: move allocation stack switch up to xfs_bmapi_allocate
  326c035 xfs: introduce XFS_BMAPI_STACK_SWITCH
  408cc4e xfs: zero allocation_args on the kernel stack
  7e9620f xfs: only update the last_sync_lsn when a transaction completes
      from  ddffeb8c4d0331609ef2581d84de4d763607bd37 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 6ce377afd1755eae5c93410ca9a1121dfead7b87
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Nov 2 11:38:44 2012 +1100

    xfs: fix reading of wrapped log data

    Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
    3.0-rc1 introduced a regression when recovering log buffers that
    wrapped around the end of log. The second part of the log buffer at
    the start of the physical log was being read into the header buffer
    rather than the data buffer, and hence recovery was seeing garbage
    in the data buffer when it got to the region of the log buffer that
    was incorrectly read.

    Cc: <stable@xxxxxxxxxxxxxxx> # 3.0.x, 3.2.x, 3.4.x 3.6.x
    Reported-by: Torsten Kaiser <just.for.lkml@xxxxxxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 03b1293edad462ad1ad62bcc5160c76758e450d5
Author: Dave Chinner <david@xxxxxxxxxxxxx>
Date:   Fri Nov 2 14:23:12 2012 +1100

    xfs: fix buffer shudown reference count mismatch

    When we shut down the filesystem, we have to unpin and free all the
    buffers currently active in the CIL. To do this we unpin and remove
    them in one operation as a result of a failed iclogbuf write. For
    buffers, we do this removal via a simultated IO completion of after
    marking the buffer stale.

    At the time we do this, we have two references to the buffer - the
    active LRU reference and the buf log item.  The LRU reference is
    removed by marking the buffer stale, and the active CIL reference is
    by the xfs_buf_iodone() callback that is run by
    xfs_buf_do_callbacks() during ioend processing (via the bp->b_iodone
    callback).

    However, ioend processing requires one more reference - that of the
    IO that it is completing. We don't have this reference, so we free
    the buffer prematurely and use it after it is freed. For buffers
    marked with XBF_ASYNC, this leads to assert failures in
    xfs_buf_rele() on debug kernels because the b_hold count is zero.

    Fix this by making sure we take the necessary IO reference before
    starting IO completion processing on the stale buffer, and set the
    XBF_ASYNC flag to ensure that IO completion processing removes all
    the active references from the buffer to ensure it is fully torn
    down.

    Cc: <stable@xxxxxxxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 4b62acfe99e158fb7812982d1cf90a075710a92c
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Nov 2 11:38:42 2012 +1100

    xfs: don't vmap inode cluster buffers during free

    Inode buffers do not need to be mapped as inodes are read or written
    directly from/to the pages underlying the buffer. This fixes a
    regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
    default behaviour").

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit ca250b1b3d711936d7dae9e97871f2261347f82d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Nov 2 11:38:41 2012 +1100

    xfs: invalidate allocbt blocks moved to the free list

    When we free a block from the alloc btree tree, we move it to the
    freelist held in the AGFL and mark it busy in the busy extent tree.
    This typically happens when we merge btree blocks.

    Once the transaction is committed and checkpointed, the block can
    remain on the free list for an indefinite amount of time.  Now, this
    isn't the end of the world at this point - if the free list is
    shortened, the buffer is invalidated in the transaction that moves
    it back to free space. If the buffer is allocated as metadata from
    the free list, then all the modifications getted logged, and we have
    no issues, either. And if it gets allocated as userdata direct from
    the freelist, it gets invalidated and so will never get written.

    However, during the time it sits on the free list, pressure on the
    log can cause the AIL to be pushed and the buffer that covers the
    block gets pushed for write. IOWs, we end up writing a freed
    metadata block to disk. Again, this isn't the end of the world
    because we know from the above we are only writing to free space.

    The problem, however, is for validation callbacks. If the block was
    on old btree root block, then the level of the block is going to be
    higher than the current tree root, and so will fail validation.
    There may be other inconsistencies in the block as well, and
    currently we don't care because the block is in free space. Shutting
    down the filesystem because a freed block doesn't pass write
    validation, OTOH, is rather unfriendly.

    So, make sure we always invalidate buffers as they move from the
    free space trees to the free list so that we guarantee they never
    get written to disk while on the free list.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Phil White <pwhite@xxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 1e7acbb7bc1ae7c1c62fd1310b3176a820225056
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Oct 25 17:22:30 2012 +1100

    xfs: silence uninitialised f.file warning.

    Uninitialised variable build warning introduced by 2903ff0 ("switch
    simple cases of fget_light to fdget"), gcc is not smart enough to
    work out that the variable is not used uninitialised, and the commit
    removed the initialisation at declaration that the old variable had.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit eaef854335ce09956e930fe4a193327417edc6c9
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Oct 9 14:50:52 2012 +1100

    xfs: growfs: don't read garbage for new secondary superblocks

    When updating new secondary superblocks in a growfs operation, the
    superblock buffer is read from the newly grown region of the
    underlying device. This is not guaranteed to be zero, so violates
    the underlying assumption that the unused parts of superblocks are
    zero filled. Get a new buffer for these secondary superblocks to
    ensure that the unused regions are zero filled correctly.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 1f3c785c3adb7d2b109ec7c8f10081d1294b03d3
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Oct 5 11:06:59 2012 +1000

    xfs: move allocation stack switch up to xfs_bmapi_allocate

    Switching stacks are xfs_alloc_vextent can cause deadlocks when we
    run out of worker threads on the allocation workqueue. This can
    occur because xfs_bmap_btalloc can make multiple calls to
    xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
    return with the AGF locked in the current allocation transaction.

    If we then need to make another allocation, and all the allocation
    worker contexts are exhausted because the are blocked waiting for
    the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
    work completed to release the AGF.  Hence allocation effectively
    deadlocks.

    To avoid this, move the stack switch one layer up to
    xfs_bmapi_allocate() so that all of the allocation attempts in a
    single switched stack transaction occur in a single worker context.
    This avoids the problem of an allocation being blocked waiting for
    a worker thread whilst holding the AGF.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 326c03555b914ff153ba5b40df87fd6e28e7e367
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Oct 5 11:06:58 2012 +1000

    xfs: introduce XFS_BMAPI_STACK_SWITCH

    Certain allocation paths through xfs_bmapi_write() are in situations
    where we have limited stack available. These are almost always in
    the buffered IO writeback path when convertion delayed allocation
    extents to real extents.

    The current stack switch occurs for userdata allocations, which
    means we also do stack switches for preallocation, direct IO and
    unwritten extent conversion, even those these call chains have never
    been implicated in a stack overrun.

    Hence, let's target just the single stack overun offended for stack
    switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
    the caller can pass xfs_bmapi_write() to indicate it should switch
    stacks if it needs to do allocation.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 408cc4e97a3ccd172d2d676e4b585badf439271b
Author: Mark Tinguely <tinguely@xxxxxxx>
Date:   Thu Sep 20 13:16:45 2012 -0500

    xfs: zero allocation_args on the kernel stack

    Zero the kernel stack space that makes up the xfs_alloc_arg structures.

    Signed-off-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Ben Myers <bpm@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

commit 7e9620f21d8c9e389fd6845487e07d5df898a2e4
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Oct 8 21:56:12 2012 +1100

    xfs: only update the last_sync_lsn when a transaction completes

    The log write code stamps each iclog with the current tail LSN in
    the iclog header so that recovery knows where to find the tail of
    thelog once it has found the head. Normally this is taken from the
    first item on the AIL - the log item that corresponds to the oldest
    active item in the log.

    The problem is that when the AIL is empty, the tail lsn is dervied
    from the the l_last_sync_lsn, which is the LSN of the last iclog to
    be written to the log. In most cases this doesn't happen, because
    the AIL is rarely empty on an active filesystem. However, when it
    does, it opens up an interesting case when the transaction being
    committed to the iclog spans multiple iclogs.

    That is, the first iclog is stamped with the l_last_sync_lsn, and IO
    is issued. Then the next iclog is setup, the changes copied into the
    iclog (takes some time), and then the l_last_sync_lsn is stamped
    into the header and IO is issued. This is still the same
    transaction, so the tail lsn of both iclogs must be the same for log
    recovery to find the entire transaction to be able to replay it.

    The problem arises in that the iclog buffer IO completion updates
    the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
    completes it's IO before the second iclog is filled and has the tail
    lsn stamped in it, it will stamp the LSN of the first iclog into
    it's tail lsn field. If the system fails at this point, log recovery
    will not see a complete transaction, so the transaction will no be
    replayed.

    The fix is simple - the l_last_sync_lsn is updated when a iclog
    buffer IO completes, and this is incorrect. The l_last_sync_lsn
    shoul dbe updated when a transaction is completed by a iclog buffer
    IO. That is, only iclog buffers that have transaction commit
    callbacks attached to them should update the l_last_sync_lsn. This
    means that the last_sync_lsn will only move forward when a commit
    record it written, not in the middle of a large transaction that is
    rolling through multiple iclog buffers.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

-----------------------------------------------------------------------

Summary of changes:
 fs/xfs/xfs_alloc.c       |   43 ++-----------------------------
 fs/xfs/xfs_alloc.h       |    3 ---
 fs/xfs/xfs_alloc_btree.c |    2 ++
 fs/xfs/xfs_bmap.c        |   63 +++++++++++++++++++++++++++++++++++++++-------
 fs/xfs/xfs_bmap.h        |    9 ++++++-
 fs/xfs/xfs_buf_item.c    |   18 +++++++++++++
 fs/xfs/xfs_fsops.c       |   21 ++++++++++++++--
 fs/xfs/xfs_ialloc.c      |    1 +
 fs/xfs/xfs_inode.c       |    3 ++-
 fs/xfs/xfs_ioctl.c       |    2 +-
 fs/xfs/xfs_iomap.c       |    4 ++-
 fs/xfs/xfs_log.c         |   19 +++++++++++---
 fs/xfs/xfs_log_recover.c |    2 +-
 13 files changed, 127 insertions(+), 63 deletions(-)

hooks/post-receive
-- 
XFS development tree

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs