On Thu, Nov 17, 2011 at 01:25:48PM -0600, Ben Myers wrote: > Hey Christoph, > > On Tue, Nov 15, 2011 at 03:14:12PM -0500, Christoph Hellwig wrote: > > Do not use unlogged metadata updates and the VFS dirty bit for updating > > the file size after writeback. In addition to causing various problems > > with updates getting delayed for far too log this also drags in the > > unscalable VFS dirty tracking, and is one of the few remaining unlogged > > metadata updates. > > > > Note that we allocate a new transaction from the I/O completion handler. > > While this sounds fairly dangerous it isn't an issue in practice given > > that any appending write alreay had to start a transaction in writepages > > to allocate blocks, and we'll start throtteling there if we run low on > > log space or memory. > > > > We could still occasionally stall in the completion handler, but given > > that we have per-filesystems workqueues for the I/O completions, > > and completions that do not have to either convert unwritten extents > > or update the file size are processed from interrupt context we do not > > have to worry about this stalling a system to death. > > > > In addition to that implementing log reservations from ->writepage that > > are only released by a different thread requires a lot of work, and even > > with that wasn't quite doable in a deadlock-free manner. > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > > > --- > > fs/xfs/xfs_aops.c | 49 ++++++++++++++++++++++++++++--------------------- > > fs/xfs/xfs_file.c | 36 +++++++++++++++++++++++++++++++++++- > > 2 files changed, 63 insertions(+), 22 deletions(-) > > > > Index: linux-2.6/fs/xfs/xfs_aops.c > > =================================================================== > > --- linux-2.6.orig/fs/xfs/xfs_aops.c 2011-11-15 18:43:04.183113001 +0100 > > +++ linux-2.6/fs/xfs/xfs_aops.c 2011-11-15 18:43:05.059779662 +0100 > > @@ -26,6 +26,7 @@ > > #include "xfs_bmap_btree.h" > > #include "xfs_dinode.h" > > #include "xfs_inode.h" > > +#include "xfs_inode_item.h" > > #include "xfs_alloc.h" > > #include "xfs_error.h" > > #include "xfs_rw.h" > > @@ -114,22 +115,39 @@ static inline bool xfs_ioend_is_append(s > > * not extend all the way to the valid file size then restrict this update to > > * the end of the write. > > */ > > -STATIC void > > +STATIC int > > xfs_setfilesize( > > struct xfs_ioend *ioend) > > { > > struct xfs_inode *ip = XFS_I(ioend->io_inode); > > + struct xfs_mount *mp = ip->i_mount; > > + struct xfs_trans *tp; > > xfs_fsize_t isize; > > + int error = 0; > > > > xfs_ilock(ip, XFS_ILOCK_EXCL); > > isize = xfs_new_eof(ip, ioend->io_offset + ioend->io_size); > > - if (isize) { > > - trace_xfs_setfilesize(ip, ioend->io_offset, ioend->io_size); > > - ip->i_d.di_size = isize; > > - xfs_mark_inode_dirty(ip); > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > + > > + if (!isize) > > + return 0; > > + > > + trace_xfs_setfilesize(ip, ioend->io_offset, ioend->io_size); > > + > > + tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS); > > + error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0); > > + if (error) { > > + xfs_trans_cancel(tp, 0); > > + return error; > > } > > > > - xfs_iunlock(ip, XFS_ILOCK_EXCL); > > + xfs_ilock(ip, XFS_ILOCK_EXCL); > > + > > + ip->i_d.di_size = isize; > > Make this assignment above, before dropping the ilock, so that multiple > updates to di_size cannot be reordered while allocating a transaction. Which then makes it a non-transactional change. i.e. it can be written to disk without having been logged in a transaction. That defeats the purpose of this change.... I think what the code needs to do here is recheck whether the size update needs to be done, and if not cancel the transaction and do nothing. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs