[XFS updates] XFS development tree branch, master, updated. v2.6.34-19248-g2bfc96a

xfs@xxxxxxxxxxx · Mon, 30 Aug 2010 13:29:22 -0500

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".

The branch, master has been updated
  b5420f2 xfs: do not discard page cache data on EAGAIN
  3b93c7a xfs: don't do memory allocation under the CIL context lock
  a44f13e xfs: Reduce log force overhead for delayed logging
  1a387d3 xfs: dummy transactions should not dirty VFS state
  2fe3366 xfs: ensure f_ffree returned by statfs() is non-negative
  efceab1 xfs: handle negative wbc->nr_to_write during sync writeback
  4536f2a xfs: fix untrusted inode number lookup
  5b3eed7 xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE
  d17c701 xfs: unlock items before allowing the CIL to commit
  5f248c9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
  b57922d convert remaining ->clear_inode() to ->evict_inode()
  a4ffdde simplify checks for I_CLEAR/I_FREEING
  fa9b227 xfs: new truncate sequence
  155130a get rid of block_write_begin_newtrunc
  eafdc7d sort out blockdev_direct_IO variants
  90e0c22 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
  ade7ce3 quota: Clean up the namespace in dqblk_xfs.h
      from  6b0a2996a0c023d84bc27ec7528a6e54cb5ea264 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit b5420f235953448eeae615b3361584dc5e414f34
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date:   Tue Aug 24 11:47:51 2010 +1000

    xfs: do not discard page cache data on EAGAIN

    If xfs_map_blocks returns EAGAIN because of lock contention we must redirty the
    page and not disard the pagecache content and return an error from writepage.
    We used to do this correctly, but the logic got lost during the recent
    reshuffle of the writepage code.

    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Reported-by: Mike Gao <ygao.linux@xxxxxxxxx>
    Tested-by: Mike Gao <ygao.linux@xxxxxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>

commit 3b93c7aaefc05ee2a75e2726929b01a321402984
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Aug 24 11:45:53 2010 +1000

    xfs: don't do memory allocation under the CIL context lock

    Formatting items requires memory allocation when using delayed
    logging. Currently that memory allocation is done while holding the
    CIL context lock in read mode. This means that if memory allocation
    takes some time (e.g. enters reclaim), we cannot push on the CIL
    until the allocation(s) required by formatting complete. This can
    stall CIL pushes for some time, and once a push is stalled so are
    all new transaction commits.

    Fix this splitting the item formatting into two steps. The first
    step which does the allocation and memcpy() into the allocated
    buffer is now done outside the CIL context lock, and only the CIL
    insert is done inside the CIL context lock. This avoids the stall
    issue.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit a44f13edf0ebb4e41942d0f16ca80489dcf6659d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Aug 24 11:40:03 2010 +1000

    xfs: Reduce log force overhead for delayed logging

    Delayed logging adds some serialisation to the log force process to
    ensure that it does not deference a bad commit context structure
    when determining if a CIL push is necessary or not. It does this by
    grabing the CIL context lock exclusively, then dropping it before
    pushing the CIL if necessary. This causes serialisation of all log
    forces and pushes regardless of whether a force is necessary or not.
    As a result fsync heavy workloads (like dbench) can be significantly
    slower with delayed logging than without.

    To avoid this penalty, copy the current sequence from the context to
    the CIL structure when they are swapped. This allows us to do
    unlocked checks on the current sequence without having to worry
    about dereferencing context structures that may have already been
    freed. Hence we can remove the CIL context locking in the forcing
    code and only call into the push code if the current context matches
    the sequence we need to force.

    By passing the sequence into the push code, we can check the
    sequence again once we have the CIL lock held exclusive and abort if
    the sequence has already been pushed. This avoids a lock round-trip
    and unnecessary CIL pushes when we have racing push calls.

    The result is that the regression in dbench performance goes away -
    this change improves dbench performance on a ramdisk from ~2100MB/s
    to ~2500MB/s. This compares favourably to not using delayed logging
    which retuns ~2500MB/s for the same workload.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 1a387d3be2b30c90f20d49a3497a8fc0693a9d18
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Aug 24 11:46:31 2010 +1000

    xfs: dummy transactions should not dirty VFS state

    When we  need to cover the log, we issue dummy transactions to ensure
    the current log tail is on disk. Unfortunately we currently use the
    root inode in the dummy transaction, and the act of committing the
    transaction dirties the inode at the VFS level.

    As a result, the VFS writeback of the dirty inode will prevent the
    filesystem from idling long enough for the log covering state
    machine to complete. The state machine gets stuck in a loop issuing
    new dummy transactions to cover the log and never makes progress.

    To avoid this problem, the dummy transactions should not cause
    externally visible state changes. To ensure this occurs, make sure
    that dummy transactions log an unchanging field in the superblock as
    it's state is never propagated outside the filesystem. This allows
    the log covering state machine to complete successfully and the
    filesystem now correctly enters a fully idle state about 90s after
    the last modification was made.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 2fe33661fcd79d4c53022509f7223d526b5fa233
Author: Stuart Brodsky <sbrodsky@xxxxxxx>
Date:   Tue Aug 24 11:46:05 2010 +1000

    xfs: ensure f_ffree returned by statfs() is non-negative

    Because of delayed updates to sb_icount field in the super block, it
    is possible to allocate over maxicount number of inodes.  This
    causes the arithmetic to calculate a negative number of free inodes
    in user commands like df or stat -f.

    Since maxicount is a somewhat arbitrary number, a slight over
    allocation is not critical but user commands should be displayed as
    0 or greater and never go negative.  To do this the value in the
    stats buffer f_ffree is capped to never go negative.

    [ Modified to use max_t as per Christoph's comment. ]

    Signed-off-by: Stu Brodsky <sbrodsky@xxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>

commit efceab1d563153a2b1a6e7d35376241a48126989
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Aug 24 11:44:56 2010 +1000

    xfs: handle negative wbc->nr_to_write during sync writeback

    During data integrity (WB_SYNC_ALL) writeback, wbc->nr_to_write will
    go negative on inodes with more than 1024 dirty pages due to
    implementation details of write_cache_pages(). Currently XFS will
    abort page clustering in writeback once nr_to_write drops below
    zero, and so for data integrity writeback we will do very
    inefficient page at a time allocation and IO submission for inodes
    with large numbers of dirty pages.

    Fix this by only aborting the page clustering code when
    wbc->nr_to_write is negative and the sync mode is WB_SYNC_NONE.

    Cc: <stable@xxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 4536f2ad8b330453d7ebec0746c4374eadd649b1
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Aug 24 11:42:30 2010 +1000

    xfs: fix untrusted inode number lookup

    Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode
    numbers during lookup") changes the inode lookup code to do btree lookups for
    untrusted inode numbers. This change made an invalid assumption about the
    alignment of inodes and hence incorrectly calculated the first inode in the
    cluster. As a result, some inode numbers were being incorrectly considered
    invalid when they were actually valid.

    The issue was not picked up by the xfstests suite because it always runs fsr
    and dump (the two utilities that utilise the bulkstat interface) on cache hot
    inodes and hence the lookup code in the cold cache path was not sufficiently
    exercised to uncover this intermittent problem.

    Fix the issue by relaxing the btree lookup criteria and then checking if the
    record returned contains the inode number we are lookup for. If it we get an
    incorrect record, then the inode number is invalid.

    Cc: <stable@xxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 5b3eed756cd37255cad1181bd86bfd0977e97953
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Aug 24 11:42:41 2010 +1000

    xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE

    Under heavy load parallel metadata loads (e.g. dbench), we can fail
    to mark all the inodes in a cluster being freed as XFS_ISTALE as we
    skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on.
    When this happens and the inode cluster buffer has already been
    marked stale and freed, inode reclaim can try to write the inode out
    as it is dirty and not marked stale. This can result in writing th
    metadata to an freed extent, or in the case it has already
    been overwritten trigger a magic number check failure and return an
    EUCLEAN error such as:

    Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117

    Fix this by ensuring that we hoover up all in memory inodes in the
    cluster and mark them XFS_ISTALE when freeing the cluster.

    Cc: <stable@xxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit d17c701ce6a548a92f7f8a3cec20299465f36ee3
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Aug 24 11:42:52 2010 +1000

    xfs: unlock items before allowing the CIL to commit

    When we commit a transaction using delayed logging, we need to
    unlock the items in the transaciton before we unlock the CIL context
    and allow it to be checkpointed. If we unlock them after we release
    the CIl context lock, the CIL can checkpoint and complete before
    we free the log items. This breaks stale buffer item unlock and
    unpin processing as there is an implicit assumption that the unlock
    will occur before the unpin.

    Also, some log items need to store the LSN of the transaction commit
    in the item (inodes and EFIs) and so can race with other transaction
    completions if we don't prevent the CIL from checkpointing before
    the unlock occurs.

    Cc: <stable@xxxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 5f248c9c251c60af3403902b26e08de43964ea0b
Merge: f6cec0ae58c17522a7bc4e2f39dae19f199ab534 dca332528bc69e05f67161e1ed59929633d5e63d
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date:   Tue Aug 10 11:26:52 2010 -0700

    Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
      no need for list_for_each_entry_safe()/resetting with superblock list
      Fix sget() race with failing mount
      vfs: don't hold s_umount over close_bdev_exclusive() call
      sysv: do not mark superblock dirty on remount
      sysv: do not mark superblock dirty on mount
      btrfs: remove junk sb_dirt change
      BFS: clean up the superblock usage
      AFFS: wait for sb synchronization when needed
      AFFS: clean up dirty flag usage
      cifs: truncate fallout
      mbcache: fix shrinker function return value
      mbcache: Remove unused features
      add f_flags to struct statfs(64)
      pass a struct path to vfs_statfs
      update VFS documentation for method changes.
      All filesystems that need invalidate_inode_buffers() are doing that explicitly
      convert remaining ->clear_inode() to ->evict_inode()
      Make ->drop_inode() just return whether inode needs to be dropped
      fs/inode.c:clear_inode() is gone
      fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
      ...

    Fix up trivial conflicts in fs/nilfs2/super.c

commit b57922d97fd6f79b6dbe6db0c4fd30d219fa08c1
Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date:   Mon Jun 7 14:34:48 2010 -0400

    convert remaining ->clear_inode() to ->evict_inode()

    Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

commit a4ffdde6e56fdf8c34ddadc2674d6eb978083369
Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date:   Wed Jun 2 17:38:30 2010 -0400

    simplify checks for I_CLEAR/I_FREEING

    add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
    equivalent to I_FREEING for almost all code looking at either;
    it's there to keep track of having called clear_inode() exactly
    once per inode lifetime, at some point after having set I_FREEING.
    I_CLEAR and I_FREEING never get set at the same time with the
    current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
    instead of I_CLEAR without loss of information.  As the result of
    such change, checks become simpler and the amount of code that needs
    to know about I_CLEAR shrinks a lot.

    Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

commit fa9b227e9019ebaeeb06224ba531a490f91144b3
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date:   Mon Jun 14 05:17:31 2010 -0400

    xfs: new truncate sequence

    Convert XFS to the new truncate sequence.  We still can have errors after
    updating the file size in xfs_setattr, but these are real I/O errors and lead
    to a transaction abort and filesystem shutdown, so they are not an issue.

    Errors from ->write_begin and write_end can now be handled correctly because
    we can actually get rid of the delalloc extents while previous the buffer
    state was stipped in block_invalidatepage.

    There is still no error handling for ->direct_IO, because doing so will need
    some major restructuring given that we only have the iolock shared and do not
    hold i_mutex at all.  Fortunately leaving the normally allocated blocks behind
    there is not a major issue and this will get cleaned up by xfs_free_eofblock
    later.

    Note: the patch is against Al's vfs.git tree as that contains the nessecary
    preparations.  I'd prefer to get it applied there so that we can get some
    testing in linux-next.

    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

commit 155130a4f7848b1aac439cab6bda1a175507c71c
Author: Christoph Hellwig <hch@xxxxxx>
Date:   Fri Jun 4 11:29:58 2010 +0200

    get rid of block_write_begin_newtrunc

    Move the call to vmtruncate to get rid of accessive blocks to the callers
    in preparation of the new truncate sequence and rename the non-truncating
    version to block_write_begin.

    While we're at it also remove several unused arguments to block_write_begin.

    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

commit eafdc7d190a944c755a9fe68573c193e6e0217e7
Author: Christoph Hellwig <hch@xxxxxx>
Date:   Fri Jun 4 11:29:53 2010 +0200

    sort out blockdev_direct_IO variants

    Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence.  This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

commit 90e0c225968f0878e090c7ff3f88323973476cee
Merge: 938a73b959cf77aadc41bded3bf416b618aa20b3 5f11e6a44059f728dddd8d0dbe5b4368ea93575b
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date:   Sat Aug 7 12:57:07 2010 -0700

    Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
      ext3: Fix dirtying of journalled buffers in data=journal mode
      ext3: default to ordered mode
      quota: Use mark_inode_dirty_sync instead of mark_inode_dirty
      quota: Change quota error message to print out disk and function name
      MAINTAINERS: Update entries of ext2 and ext3
      MAINTAINERS: Update address of Andreas Dilger
      ext3: Avoid filesystem corruption after a crash under heavy delete load
      ext3: remove vestiges of nobh support
      ext3: Fix set but unused variables
      quota: clean up quota active checks
      quota: Clean up the namespace in dqblk_xfs.h
      quota: check quota reservation on remove_dquot_ref

commit ade7ce31c22e961dfbe1a6d57fd362c90c187cbd
Author: Christoph Hellwig <hch@xxxxxx>
Date:   Fri Jun 4 10:56:01 2010 +0200

    quota: Clean up the namespace in dqblk_xfs.h

    Almost all identifiers use the FS_* namespace, so rename the missing few
    XFS_* ones to FS_* as well.  Without this some people might get upset
    about having too many XFS names in generic code.

    Acked-by: Steven Whitehouse <swhiteho@xxxxxxxxxx>
    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Jan Kara <jack@xxxxxxx>

-----------------------------------------------------------------------

Summary of changes:
 fs/xfs/linux-2.6/xfs_aops.c     |   75 +++++++++---
 fs/xfs/linux-2.6/xfs_iops.c     |   20 +---
 fs/xfs/linux-2.6/xfs_linux.h    |    2 -
 fs/xfs/linux-2.6/xfs_quotaops.c |   10 +-
 fs/xfs/linux-2.6/xfs_super.c    |   17 ++-
 fs/xfs/linux-2.6/xfs_sync.c     |   42 +------
 fs/xfs/linux-2.6/xfs_trace.h    |    2 +-
 fs/xfs/quota/xfs_qm_syscalls.c  |   32 +++---
 fs/xfs/xfs_fsops.c              |   31 +++--
 fs/xfs/xfs_fsops.h              |    2 +-
 fs/xfs/xfs_ialloc.c             |   16 ++-
 fs/xfs/xfs_inode.c              |   49 ++++----
 fs/xfs/xfs_log.c                |    7 +-
 fs/xfs/xfs_log_cil.c            |  263 +++++++++++++++++++++++----------------
 fs/xfs/xfs_log_priv.h           |   13 ++-
 fs/xfs/xfs_trans.c              |    5 +-
 fs/xfs/xfs_trans_priv.h         |    3 +-
 fs/xfs/xfs_vnodeops.c           |   38 +++---
 18 files changed, 350 insertions(+), 277 deletions(-)

hooks/post-receive
-- 
XFS development tree

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs