On Sun, Dec 18, 2011 at 10:50:15AM -0500, Christoph Hellwig wrote: > Since Linux 2.6.36 the writeback code has introduces various measures for > live lock prevention during sync(). Unfortunately some of these are > actively harmful for the XFS model, where the inode gets marked dirty for > metadata from the data I/O handler. > > The older_than_this checks that are now more strictly enforced since > > writeback: avoid livelocking WB_SYNC_ALL writeback > > by only calling into __writeback_inodes_sb and thus only sampling the > current cut off time once. But on a slow enough devices the previous > asynchronous sync pass might not have fully completed yet, and thus XFS > might mark metadata dirty only after that sampling of the cut off time for > the blocking pass already happened. I have not myself reproduced this > myself on a real system, but by introducing artificial delay into the > XFS I/O completion workqueues it can be reproduced easily. > > Fix this by iterating over all XFS inodes in ->sync_fs and log all that > are dirty. This might log inode that only got redirtied after the > previous pass, but given how cheap delayed logging of inodes is it > isn't a major concern for performance. > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > Index: xfs/fs/xfs/xfs_sync.c > =================================================================== > --- xfs.orig/fs/xfs/xfs_sync.c 2011-12-14 05:33:06.436599621 -0800 > +++ xfs/fs/xfs/xfs_sync.c 2011-12-14 05:38:49.084743337 -0800 > @@ -336,6 +336,29 @@ xfs_sync_fsdata( > return error; > } > > +int > +xfs_log_inode( > + struct xfs_inode *ip, > + struct xfs_perag *pag, > + int flags) > +{ > + struct xfs_mount *mp = ip->i_mount; > + struct xfs_trans *tp; > + int error; > + > + tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS); > + error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0); > + if (error) { > + xfs_trans_cancel(tp, 0); > + return error; > + } > + > + xfs_ilock(ip, XFS_ILOCK_EXCL); > + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); > + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); > + return xfs_trans_commit(tp, 0); > +} This will do a transaction on the inode, clean or dirty. That's an awful lot of overhead for the few inodes (out of perhaps millions in memory) that actually need it. with the ->dirty_inode callback from the VFS, we know the only inodes that need logging are those with i_update_core set.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs