On Mon, Feb 20, 2012 at 07:38:28PM -0500, Christoph Hellwig wrote: > Do not use unlogged metadata updates and the VFS dirty bit for updating > the file size after writeback. In addition to causing various problems > with updates getting delayed for far too long this also drags in the > unscalable VFS dirty tracking, and is one of the few remaining unlogged > metadata updates. > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> .... > > +STATIC int > +xfs_setfilesize_trans_alloc( > + struct xfs_ioend *ioend) > +{ .... > + /* > + * We hand off the transaction to the completion thread now, so > + * clear the flag here. > + */ > + current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS); ...... > @@ -173,18 +214,32 @@ xfs_end_io( > * range to normal written extens after the data I/O has finished. > */ > if (ioend->io_type == IO_UNWRITTEN) { > + /* > + * For buffered I/O we never preallocate a transaction when > + * doing the unwritten extent conversion, but for direct I/O > + * we do not know if we are converting and unwritten extent > + * or not at the point where we preallocate the transaction. > + */ > + if (ioend->io_append_trans) { > + ASSERT(ioend->io_isdirect); > + > + current_set_flags_nested( > + &ioend->io_append_trans->t_pflags, PF_FSTRANS); > + xfs_trans_cancel(ioend->io_append_trans, 0); ..... > @@ -1280,17 +1340,33 @@ xfs_vm_direct_IO( > { > struct inode *inode = iocb->ki_filp->f_mapping->host; > struct block_device *bdev = xfs_find_bdev_for_inode(inode); > + struct xfs_ioend *ioend = NULL; > ssize_t ret; > > if (rw & WRITE) { > - iocb->private = xfs_alloc_ioend(inode, IO_DIRECT); > + size_t size = iov_length(iov, nr_segs); > + > + /* > + * Direct I/O code may have to convert unwritten extents from > + * the AIO and I/O handler in interrupt context. To make this > + * possible we have to preallocate an ioend that allows defering > + * it here. For the case where we did not find an unwritten > + * extent we'll free it again later. > + */ > + iocb->private = ioend = xfs_alloc_ioend(inode, IO_DIRECT); > + if (offset + size > XFS_I(inode)->i_d.di_size) { > + ret = xfs_setfilesize_trans_alloc(ioend); > + if (ret) > + goto destroy_ioend; > + ioend->io_isdirect = 1; > + } > > ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov, > offset, nr_segs, > xfs_get_blocks_direct, > xfs_end_io_direct_write, NULL, 0); > if (ret != -EIOCBQUEUED && iocb->private) > - xfs_destroy_ioend(iocb->private); > + goto destroy_ioend; > } else { > ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov, > offset, nr_segs, > @@ -1299,6 +1375,12 @@ xfs_vm_direct_IO( > } > > return ret; > + > +destroy_ioend: > + if (ioend->io_append_trans) > + xfs_trans_cancel(ioend->io_append_trans, 0); Hmmm, I missed something here first time through: do we need to restore the PF_FSTRANS flag here before cancelling the transaction? I don't think we do (clearing a flag that is aready cleared is not a problem), but we should be consistent with the IO completion handling of this transaction. Other than that, everything looks good, so you can add: Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> When that inconsistency is fixed. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs