On Wed, Feb 29, 2012 at 04:53:51AM -0500, Christoph Hellwig wrote: > Do not use unlogged metadata updates and the VFS dirty bit for updating > the file size after writeback. In addition to causing various problems > with updates getting delayed for far too long this also drags in the > unscalable VFS dirty tracking, and is one of the few remaining unlogged > metadata updates. > > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> > Signed-off-by: Christoph Hellwig <hch@xxxxxx> ... > @@ -341,18 +398,9 @@ xfs_submit_ioend_bio( > xfs_ioend_t *ioend, > struct bio *bio) > { > - struct xfs_inode *ip = XFS_I(ioend->io_inode); > atomic_inc(&ioend->io_remaining); > bio->bi_private = ioend; > bio->bi_end_io = xfs_end_bio; > - > - /* > - * If the I/O is beyond EOF we mark the inode dirty immediately > - * but don't update the inode size until I/O completion. > - */ > - if (xfs_new_eof(ip, ioend->io_offset + ioend->io_size)) > - xfs_mark_inode_dirty(ip); > - > submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE, bio); > } > > @@ -999,8 +1047,20 @@ xfs_vm_writepage( > wbc, end_index); > } > > - if (iohead) > + if (iohead) { > + /* > + * Reserve log space if we might write beyond the on-disk > + * inode size. > + */ > + if (ioend->io_type != IO_UNWRITTEN && > + xfs_ioend_is_append(ioend)) { ^^^ I suggest that xfs_ioend_is_append should look at every ioend in the chain in order to determine if an append is possible, not just the first. Note that xfs_submit_ioend_bio above is called for each ioend in the chain. You'd only see this on a system with a larger page size than filesystem block size. > + err = xfs_setfilesize_trans_alloc(ioend); > + if (err) > + goto error; > + } > + > xfs_submit_ioend(wbc, iohead); > + } > > return 0; > > @@ -1280,17 +1340,33 @@ xfs_vm_direct_IO( > { > struct inode *inode = iocb->ki_filp->f_mapping->host; > struct block_device *bdev = xfs_find_bdev_for_inode(inode); > + struct xfs_ioend *ioend = NULL; > ssize_t ret; > > if (rw & WRITE) { > - iocb->private = xfs_alloc_ioend(inode, IO_DIRECT); > + size_t size = iov_length(iov, nr_segs); > + > + /* > + * Direct I/O code may have to convert unwritten extents from > + * the AIO and I/O handler in interrupt context. To make this > + * possible we have to preallocate an ioend that allows defering > + * it here. For the case where we did not find an unwritten > + * extent we'll free it again later. > + */ > + iocb->private = ioend = xfs_alloc_ioend(inode, IO_DIRECT); > + if (offset + size > XFS_I(inode)->i_d.di_size) { > + ret = xfs_setfilesize_trans_alloc(ioend); > + if (ret) > + goto out_destroy_ioend; > + ioend->io_isdirect = 1; > + } Ouch. I am really having trouble parsing that comment. It looks that here you are preallocating a transaction for this codepath: bio_endio .bi_end_io dio_bio_end_aio dio_complete .end_io xfs_end_io_direct_write xfs_finish_ioend_sync xfs_end_io * where io_type != IO_UNWRITTEN xfs_setfilesize In the situation where we are converting an unwritten extent we cancel the preallocated transaction and call xfs_iomap_write_unwritten where the inode core is logged with the updated size. We were already allocating an ioend here, so when you said 'To make this possible we have to preallocate an ioend that allows deferring it here', did you really mean to say that we're preallocating the transaction? Maybe there are just to many 'its' in the comment or I'm just dense. Thanks, Ben _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs