[XFS updates] XFS development tree branch, for-linus, updated. v2.6.36-rc8-11223-gc76febe

xfs@xxxxxxxxxxx · Wed, 1 Dec 2010 14:54:09 -0600

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".

The branch, for-linus has been updated
  c76febe xfs: only run xfs_error_test if error injection is active
  de25c18 xfs: avoid moving stale inodes in the AIL
  309c848 xfs: delayed alloc blocks beyond EOF are valid after writeback
  90810b9 xfs: push stale, pinned buffers on trylock failures
  c726de4 xfs: fix failed write truncation handling.
      from  ece413f59f257682de4a2e2e42af33b016af53f3 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit c76febef574fd86566bbdf1a73a547a439115c25
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Nov 30 15:15:31 2010 +1100

    xfs: only run xfs_error_test if error injection is active

    Recent tests writing lots of small files showed the flusher thread
    being CPU bound and taking a long time to do allocations on a debug
    kernel. perf showed this as the prime reason:

                 samples  pcnt function                    DSO
                 _______ _____ ___________________________ _________________

               224648.00 36.8% xfs_error_test              [kernel.kallsyms]
                86045.00 14.1% xfs_btree_check_sblock      [kernel.kallsyms]
                39778.00  6.5% prandom32                   [kernel.kallsyms]
                37436.00  6.1% xfs_btree_increment         [kernel.kallsyms]
                29278.00  4.8% xfs_btree_get_rec           [kernel.kallsyms]
                27717.00  4.5% random32                    [kernel.kallsyms]

    Walking btree blocks during allocation checking them requires each
    block (a cache hit, so no I/O) call xfs_error_test(), which then
    does a random32() call as the first operation.  IOWs, ~50% of the
    CPU is being consumed just testing whether we need to inject an
    error, even though error injection is not active.

    Kill this overhead when error injection is not active by adding a
    global counter of active error traps and only calling into
    xfs_error_test when fault injection is active.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit de25c1818c44f580ff556cb9e0f7a1c687ed870b
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Nov 30 15:15:46 2010 +1100

    xfs: avoid moving stale inodes in the AIL

    When an inode has been marked stale because the cluster is being
    freed, we don't want to (re-)insert this inode into the AIL. There
    is a race condition where the cluster buffer may be unpinned before
    the inode is inserted into the AIL during transaction committed
    processing. If the buffer is unpinned before the inode item has been
    committed and inserted, then it is possible for the buffer to be
    released and hence processthe stale inode callbacks before the inode
    is inserted into the AIL.

    In this case, we then insert a clean, stale inode into the AIL which
    will never get removed by an IO completion. It will, however, get
    reclaimed and that triggers an assert in xfs_inode_free()
    complaining about freeing an inode still in the AIL.

    This race can be avoided by not moving stale inodes forward in the AIL
    during transaction commit completion processing. This closes the
    race condition by ensuring we never insert clean stale inodes into
    the AIL. It is safe to do this because a dirty stale inode, by
    definition, must already be in the AIL.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 309c848002052edbec650075a1eb098b17c17f35
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Nov 30 15:16:02 2010 +1100

    xfs: delayed alloc blocks beyond EOF are valid after writeback

    There is an assumption in the parts of XFS that flushing a dirty
    file will make all the delayed allocation blocks disappear from an
    inode. That is, that after calling xfs_flush_pages() then
    ip->i_delayed_blks will be zero.

    This is an invalid assumption as we may have specualtive
    preallocation beyond EOF and they are recorded in
    ip->i_delayed_blks. A flush of the dirty pages of an inode will not
    change the state of these blocks beyond EOF, so a non-zero
    deeelalloc block count after a flush is valid.

    The bmap code has an invalid ASSERT() that needs to be removed, and
    the swapext code has a bug in that while it swaps the data forks
    around, it fails to swap the i_delayed_blks counter associated with
    the fork and hence can get the block accounting wrong.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 90810b9e82a36c3c57c1aeb8b2918b242a130b26
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Nov 30 15:16:16 2010 +1100

    xfs: push stale, pinned buffers on trylock failures

    As reported by Nick Piggin, XFS is suffering from long pauses under
    highly concurrent workloads when hosted on ramdisks. The problem is
    that an inode buffer is stuck in the pinned state in memory and as a
    result either the inode buffer or one of the inodes within the
    buffer is stopping the tail of the log from being moved forward.

    The system remains in this state until a periodic log force issued
    by xfssyncd causes the buffer to be unpinned. The main problem is
    that these are stale buffers, and are hence held locked until the
    transaction/checkpoint that marked them state has been committed to
    disk. When the filesystem gets into this state, only the xfssyncd
    can cause the async transactions to be committed to disk and hence
    unpin the inode buffer.

    This problem was encountered when scaling the busy extent list, but
    only the blocking lock interface was fixed to solve the problem.
    Extend the same fix to the buffer trylock operations - if we fail to
    lock a pinned, stale buffer, then force the log immediately so that
    when the next attempt to lock it comes around, it will have been
    unpinned.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit c726de4409a8d3a03877b1ef4342bfe8a15f5e5e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Nov 30 15:14:39 2010 +1100

    xfs: fix failed write truncation handling.

    Since the move to the new truncate sequence we call xfs_setattr to
    truncate down excessively instanciated blocks.  As shown by the testcase
    in kernel.org BZ #22452 that doesn't work too well.  Due to the confusion
    of the internal inode size, and the VFS inode i_size it zeroes data that
    it shouldn't.

    But full blown truncate seems like overkill here.  We only instanciate
    delayed allocations in the write path, and given that we never released
    the iolock we can't have converted them to real allocations yet either.

    The only nasty case is pre-existing preallocation which we need to skip.
    We already do this for page discard during writeback, so make the delayed
    allocation block punching a generic function and call it from the failed
    write path as well as xfs_aops_discard_page. The callers are
    responsible for ensuring that partial blocks are not truncated away,
    and that they hold the ilock.

    Based on a fix originally from Christoph Hellwig. This version used
    filesystem blocks as the range unit.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

-----------------------------------------------------------------------

Summary of changes:
 fs/xfs/linux-2.6/xfs_aops.c |   94 ++++++++++++++++++------------------------
 fs/xfs/linux-2.6/xfs_buf.c  |   35 +++++++---------
 fs/xfs/xfs_bmap.c           |   85 ++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_bmap.h           |    5 ++
 fs/xfs/xfs_dfrag.c          |   13 ++++++
 fs/xfs/xfs_error.c          |    3 +
 fs/xfs/xfs_error.h          |    5 +-
 fs/xfs/xfs_inode_item.c     |   31 +++++++++++---
 8 files changed, 188 insertions(+), 83 deletions(-)

hooks/post-receive
-- 
XFS development tree

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs