The following script from Wu Fengguang shows very bad behaviour in XFS when aggressively dirtying data during a sync on XFS, with sync times up to almost 10 times as long as ext4. A large part of the issue is that XFS writes data out itself two times in the ->sync_fs method, overriding the lifelock protection in the core writeback code, and another issue is the lock-less xfs_ioend_wait call, which doesn't prevent new ioend from beeing queue up while waiting for the count to reach zero. This patch removes the XFS-internal sync calls and relies on the VFS to do it's work just like all other filesystems do, and just uses the internal inode iterator to wait for pending ioends, but now with the iolock held. With this fix we improve the sync time quite a bit, but we'll need further work split i_iocount between pending direct I/O requests that are only relevant for truncate, and pending unwritten extent conversion that matter for sync and fsync. ------------------------------ snip ------------------------------ #!/bin/sh umount /dev/sda7 mkfs.xfs -f /dev/sda7 # mkfs.ext4 /dev/sda7 # mkfs.btrfs /dev/sda7 mount /dev/sda7 /fs echo $((50<<20)) > /proc/sys/vm/dirty_bytes pid= for i in `seq 10` do dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 & pid="$pid $!" done sleep 1 tic=$(date +'%s') sync tac=$(date +'%s') echo echo sync time: $((tac-tic)) egrep '(Dirty|Writeback|NFS_Unstable)' /proc/meminfo pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; } ------------------------------ snip ------------------------------ Reported-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Index: xfs/fs/xfs/linux-2.6/xfs_sync.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c 2011-06-17 14:16:18.442399481 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_sync.c 2011-06-17 14:18:06.632394003 +0200 @@ -215,6 +215,19 @@ xfs_inode_ag_iterator( } STATIC int +xfs_wait_ioend_cb( + struct xfs_inode *ip, + struct xfs_perag *pag, + int flags) +{ + xfs_ilock(ip, XFS_IOLOCK_SHARED); + xfs_ioend_wait(ip); + xfs_iunlock(ip, XFS_IOLOCK_SHARED); + + return 0; +} + +STATIC int xfs_sync_inode_data( struct xfs_inode *ip, struct xfs_perag *pag, @@ -359,14 +372,15 @@ xfs_quiesce_data( { int error, error2 = 0; - /* push non-blocking */ - xfs_sync_data(mp, 0); xfs_qm_sync(mp, SYNC_TRYLOCK); - - /* push and block till complete */ - xfs_sync_data(mp, SYNC_WAIT); xfs_qm_sync(mp, SYNC_WAIT); + /* wait for all pending unwritten extent conversions */ + xfs_inode_ag_iterator(mp, xfs_wait_ioend_cb, 0); + + /* force out the newly dirtied log buffers */ + xfs_log_force(mp, XFS_LOG_SYNC); + /* write superblock and hoover up shutdown errors */ error = xfs_sync_fsdata(mp); _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs