This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "XFS development tree". The branch, master has been updated b5420f2 xfs: do not discard page cache data on EAGAIN 3b93c7a xfs: don't do memory allocation under the CIL context lock a44f13e xfs: Reduce log force overhead for delayed logging 1a387d3 xfs: dummy transactions should not dirty VFS state 2fe3366 xfs: ensure f_ffree returned by statfs() is non-negative efceab1 xfs: handle negative wbc->nr_to_write during sync writeback 4536f2a xfs: fix untrusted inode number lookup 5b3eed7 xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE d17c701 xfs: unlock items before allowing the CIL to commit 5f248c9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 b57922d convert remaining ->clear_inode() to ->evict_inode() a4ffdde simplify checks for I_CLEAR/I_FREEING fa9b227 xfs: new truncate sequence 155130a get rid of block_write_begin_newtrunc eafdc7d sort out blockdev_direct_IO variants 90e0c22 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 ade7ce3 quota: Clean up the namespace in dqblk_xfs.h from 6b0a2996a0c023d84bc27ec7528a6e54cb5ea264 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit b5420f235953448eeae615b3361584dc5e414f34 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 24 11:47:51 2010 +1000 xfs: do not discard page cache data on EAGAIN If xfs_map_blocks returns EAGAIN because of lock contention we must redirty the page and not disard the pagecache content and return an error from writepage. We used to do this correctly, but the logic got lost during the recent reshuffle of the writepage code. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reported-by: Mike Gao <ygao.linux@xxxxxxxxx> Tested-by: Mike Gao <ygao.linux@xxxxxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> commit 3b93c7aaefc05ee2a75e2726929b01a321402984 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Aug 24 11:45:53 2010 +1000 xfs: don't do memory allocation under the CIL context lock Formatting items requires memory allocation when using delayed logging. Currently that memory allocation is done while holding the CIL context lock in read mode. This means that if memory allocation takes some time (e.g. enters reclaim), we cannot push on the CIL until the allocation(s) required by formatting complete. This can stall CIL pushes for some time, and once a push is stalled so are all new transaction commits. Fix this splitting the item formatting into two steps. The first step which does the allocation and memcpy() into the allocated buffer is now done outside the CIL context lock, and only the CIL insert is done inside the CIL context lock. This avoids the stall issue. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit a44f13edf0ebb4e41942d0f16ca80489dcf6659d Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Aug 24 11:40:03 2010 +1000 xfs: Reduce log force overhead for delayed logging Delayed logging adds some serialisation to the log force process to ensure that it does not deference a bad commit context structure when determining if a CIL push is necessary or not. It does this by grabing the CIL context lock exclusively, then dropping it before pushing the CIL if necessary. This causes serialisation of all log forces and pushes regardless of whether a force is necessary or not. As a result fsync heavy workloads (like dbench) can be significantly slower with delayed logging than without. To avoid this penalty, copy the current sequence from the context to the CIL structure when they are swapped. This allows us to do unlocked checks on the current sequence without having to worry about dereferencing context structures that may have already been freed. Hence we can remove the CIL context locking in the forcing code and only call into the push code if the current context matches the sequence we need to force. By passing the sequence into the push code, we can check the sequence again once we have the CIL lock held exclusive and abort if the sequence has already been pushed. This avoids a lock round-trip and unnecessary CIL pushes when we have racing push calls. The result is that the regression in dbench performance goes away - this change improves dbench performance on a ramdisk from ~2100MB/s to ~2500MB/s. This compares favourably to not using delayed logging which retuns ~2500MB/s for the same workload. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 1a387d3be2b30c90f20d49a3497a8fc0693a9d18 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Aug 24 11:46:31 2010 +1000 xfs: dummy transactions should not dirty VFS state When we need to cover the log, we issue dummy transactions to ensure the current log tail is on disk. Unfortunately we currently use the root inode in the dummy transaction, and the act of committing the transaction dirties the inode at the VFS level. As a result, the VFS writeback of the dirty inode will prevent the filesystem from idling long enough for the log covering state machine to complete. The state machine gets stuck in a loop issuing new dummy transactions to cover the log and never makes progress. To avoid this problem, the dummy transactions should not cause externally visible state changes. To ensure this occurs, make sure that dummy transactions log an unchanging field in the superblock as it's state is never propagated outside the filesystem. This allows the log covering state machine to complete successfully and the filesystem now correctly enters a fully idle state about 90s after the last modification was made. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 2fe33661fcd79d4c53022509f7223d526b5fa233 Author: Stuart Brodsky <sbrodsky@xxxxxxx> Date: Tue Aug 24 11:46:05 2010 +1000 xfs: ensure f_ffree returned by statfs() is non-negative Because of delayed updates to sb_icount field in the super block, it is possible to allocate over maxicount number of inodes. This causes the arithmetic to calculate a negative number of free inodes in user commands like df or stat -f. Since maxicount is a somewhat arbitrary number, a slight over allocation is not critical but user commands should be displayed as 0 or greater and never go negative. To do this the value in the stats buffer f_ffree is capped to never go negative. [ Modified to use max_t as per Christoph's comment. ] Signed-off-by: Stu Brodsky <sbrodsky@xxxxxxx> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> commit efceab1d563153a2b1a6e7d35376241a48126989 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Aug 24 11:44:56 2010 +1000 xfs: handle negative wbc->nr_to_write during sync writeback During data integrity (WB_SYNC_ALL) writeback, wbc->nr_to_write will go negative on inodes with more than 1024 dirty pages due to implementation details of write_cache_pages(). Currently XFS will abort page clustering in writeback once nr_to_write drops below zero, and so for data integrity writeback we will do very inefficient page at a time allocation and IO submission for inodes with large numbers of dirty pages. Fix this by only aborting the page clustering code when wbc->nr_to_write is negative and the sync mode is WB_SYNC_NONE. Cc: <stable@xxxxxxxxxx> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 4536f2ad8b330453d7ebec0746c4374eadd649b1 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Aug 24 11:42:30 2010 +1000 xfs: fix untrusted inode number lookup Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode numbers during lookup") changes the inode lookup code to do btree lookups for untrusted inode numbers. This change made an invalid assumption about the alignment of inodes and hence incorrectly calculated the first inode in the cluster. As a result, some inode numbers were being incorrectly considered invalid when they were actually valid. The issue was not picked up by the xfstests suite because it always runs fsr and dump (the two utilities that utilise the bulkstat interface) on cache hot inodes and hence the lookup code in the cold cache path was not sufficiently exercised to uncover this intermittent problem. Fix the issue by relaxing the btree lookup criteria and then checking if the record returned contains the inode number we are lookup for. If it we get an incorrect record, then the inode number is invalid. Cc: <stable@xxxxxxxxxx> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 5b3eed756cd37255cad1181bd86bfd0977e97953 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Aug 24 11:42:41 2010 +1000 xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE Under heavy load parallel metadata loads (e.g. dbench), we can fail to mark all the inodes in a cluster being freed as XFS_ISTALE as we skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on. When this happens and the inode cluster buffer has already been marked stale and freed, inode reclaim can try to write the inode out as it is dirty and not marked stale. This can result in writing th metadata to an freed extent, or in the case it has already been overwritten trigger a magic number check failure and return an EUCLEAN error such as: Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117 Fix this by ensuring that we hoover up all in memory inodes in the cluster and mark them XFS_ISTALE when freeing the cluster. Cc: <stable@xxxxxxxxxx> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit d17c701ce6a548a92f7f8a3cec20299465f36ee3 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Aug 24 11:42:52 2010 +1000 xfs: unlock items before allowing the CIL to commit When we commit a transaction using delayed logging, we need to unlock the items in the transaciton before we unlock the CIL context and allow it to be checkpointed. If we unlock them after we release the CIl context lock, the CIL can checkpoint and complete before we free the log items. This breaks stale buffer item unlock and unpin processing as there is an implicit assumption that the unlock will occur before the unpin. Also, some log items need to store the LSN of the transaction commit in the item (inodes and EFIs) and so can race with other transaction completions if we don't prevent the CIL from checkpointing before the unlock occurs. Cc: <stable@xxxxxxxxxx> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 5f248c9c251c60af3403902b26e08de43964ea0b Merge: f6cec0ae58c17522a7bc4e2f39dae19f199ab534 dca332528bc69e05f67161e1ed59929633d5e63d Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Tue Aug 10 11:26:52 2010 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c commit b57922d97fd6f79b6dbe6db0c4fd30d219fa08c1 Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Date: Mon Jun 7 14:34:48 2010 -0400 convert remaining ->clear_inode() to ->evict_inode() Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> commit a4ffdde6e56fdf8c34ddadc2674d6eb978083369 Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Date: Wed Jun 2 17:38:30 2010 -0400 simplify checks for I_CLEAR/I_FREEING add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is equivalent to I_FREEING for almost all code looking at either; it's there to keep track of having called clear_inode() exactly once per inode lifetime, at some point after having set I_FREEING. I_CLEAR and I_FREEING never get set at the same time with the current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR instead of I_CLEAR without loss of information. As the result of such change, checks become simpler and the amount of code that needs to know about I_CLEAR shrinks a lot. Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> commit fa9b227e9019ebaeeb06224ba531a490f91144b3 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Jun 14 05:17:31 2010 -0400 xfs: new truncate sequence Convert XFS to the new truncate sequence. We still can have errors after updating the file size in xfs_setattr, but these are real I/O errors and lead to a transaction abort and filesystem shutdown, so they are not an issue. Errors from ->write_begin and write_end can now be handled correctly because we can actually get rid of the delalloc extents while previous the buffer state was stipped in block_invalidatepage. There is still no error handling for ->direct_IO, because doing so will need some major restructuring given that we only have the iolock shared and do not hold i_mutex at all. Fortunately leaving the normally allocated blocks behind there is not a major issue and this will get cleaned up by xfs_free_eofblock later. Note: the patch is against Al's vfs.git tree as that contains the nessecary preparations. I'd prefer to get it applied there so that we can get some testing in linux-next. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> commit 155130a4f7848b1aac439cab6bda1a175507c71c Author: Christoph Hellwig <hch@xxxxxx> Date: Fri Jun 4 11:29:58 2010 +0200 get rid of block_write_begin_newtrunc Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to block_write_begin. While we're at it also remove several unused arguments to block_write_begin. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> commit eafdc7d190a944c755a9fe68573c193e6e0217e7 Author: Christoph Hellwig <hch@xxxxxx> Date: Fri Jun 4 11:29:53 2010 +0200 sort out blockdev_direct_IO variants Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> commit 90e0c225968f0878e090c7ff3f88323973476cee Merge: 938a73b959cf77aadc41bded3bf416b618aa20b3 5f11e6a44059f728dddd8d0dbe5b4368ea93575b Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Sat Aug 7 12:57:07 2010 -0700 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext3: Fix dirtying of journalled buffers in data=journal mode ext3: default to ordered mode quota: Use mark_inode_dirty_sync instead of mark_inode_dirty quota: Change quota error message to print out disk and function name MAINTAINERS: Update entries of ext2 and ext3 MAINTAINERS: Update address of Andreas Dilger ext3: Avoid filesystem corruption after a crash under heavy delete load ext3: remove vestiges of nobh support ext3: Fix set but unused variables quota: clean up quota active checks quota: Clean up the namespace in dqblk_xfs.h quota: check quota reservation on remove_dquot_ref commit ade7ce31c22e961dfbe1a6d57fd362c90c187cbd Author: Christoph Hellwig <hch@xxxxxx> Date: Fri Jun 4 10:56:01 2010 +0200 quota: Clean up the namespace in dqblk_xfs.h Almost all identifiers use the FS_* namespace, so rename the missing few XFS_* ones to FS_* as well. Without this some people might get upset about having too many XFS names in generic code. Acked-by: Steven Whitehouse <swhiteho@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Jan Kara <jack@xxxxxxx> ----------------------------------------------------------------------- Summary of changes: fs/xfs/linux-2.6/xfs_aops.c | 75 +++++++++--- fs/xfs/linux-2.6/xfs_iops.c | 20 +--- fs/xfs/linux-2.6/xfs_linux.h | 2 - fs/xfs/linux-2.6/xfs_quotaops.c | 10 +- fs/xfs/linux-2.6/xfs_super.c | 17 ++- fs/xfs/linux-2.6/xfs_sync.c | 42 +------ fs/xfs/linux-2.6/xfs_trace.h | 2 +- fs/xfs/quota/xfs_qm_syscalls.c | 32 +++--- fs/xfs/xfs_fsops.c | 31 +++-- fs/xfs/xfs_fsops.h | 2 +- fs/xfs/xfs_ialloc.c | 16 ++- fs/xfs/xfs_inode.c | 49 ++++---- fs/xfs/xfs_log.c | 7 +- fs/xfs/xfs_log_cil.c | 263 +++++++++++++++++++++++---------------- fs/xfs/xfs_log_priv.h | 13 ++- fs/xfs/xfs_trans.c | 5 +- fs/xfs/xfs_trans_priv.h | 3 +- fs/xfs/xfs_vnodeops.c | 38 +++--- 18 files changed, 350 insertions(+), 277 deletions(-) hooks/post-receive -- XFS development tree _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs