Re: [PATCH 4/8] xfs: log file size updates at I/O completion time

Ben Myers <bpm@xxxxxxx> · Mon, 5 Mar 2012 14:00:18 -0600

On Wed, Feb 29, 2012 at 04:53:51AM -0500, Christoph Hellwig wrote:
> Do not use unlogged metadata updates and the VFS dirty bit for updating
> the file size after writeback.  In addition to causing various problems
> with updates getting delayed for far too long this also drags in the
> unscalable VFS dirty tracking, and is one of the few remaining unlogged
> metadata updates.
> 
> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>

...

> @@ -341,18 +398,9 @@ xfs_submit_ioend_bio(
>  	xfs_ioend_t		*ioend,
>  	struct bio		*bio)
>  {
> -	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
>  	atomic_inc(&ioend->io_remaining);
>  	bio->bi_private = ioend;
>  	bio->bi_end_io = xfs_end_bio;
> -
> -	/*
> -	 * If the I/O is beyond EOF we mark the inode dirty immediately
> -	 * but don't update the inode size until I/O completion.
> -	 */
> -	if (xfs_new_eof(ip, ioend->io_offset + ioend->io_size))
> -		xfs_mark_inode_dirty(ip);
> -
>  	submit_bio(wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE, bio);
>  }
>  
> @@ -999,8 +1047,20 @@ xfs_vm_writepage(
>  				  wbc, end_index);
>  	}
>  
> -	if (iohead)
> +	if (iohead) {
> +		/*
> +		 * Reserve log space if we might write beyond the on-disk
> +		 * inode size.
> +		 */
> +		if (ioend->io_type != IO_UNWRITTEN &&
> +		    xfs_ioend_is_append(ioend)) {
			^^^

I suggest that xfs_ioend_is_append should look at every ioend in the
chain in order to determine if an append is possible, not just the
first.  Note that xfs_submit_ioend_bio above is called for each ioend in
the chain.  You'd only see this on a system with a larger page size than
filesystem block size. 

> +			err = xfs_setfilesize_trans_alloc(ioend);
> +			if (err)
> +				goto error;
> +		}
> +
>  		xfs_submit_ioend(wbc, iohead);
> +	}
>  
>  	return 0;
>  
> @@ -1280,17 +1340,33 @@ xfs_vm_direct_IO(
>  {
>  	struct inode		*inode = iocb->ki_filp->f_mapping->host;
>  	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
> +	struct xfs_ioend	*ioend = NULL;
>  	ssize_t			ret;
>  
>  	if (rw & WRITE) {
> -		iocb->private = xfs_alloc_ioend(inode, IO_DIRECT);
> +		size_t size = iov_length(iov, nr_segs);
> +
> +		/*
> +		 * Direct I/O code may have to convert unwritten extents from
> +		 * the AIO and I/O handler in interrupt context.  To make this
> +		 * possible we have to preallocate an ioend that allows defering
> +		 * it here.  For the case where we did not find an unwritten
> +		 * extent we'll free it again later.
> +		 */
> +		iocb->private = ioend = xfs_alloc_ioend(inode, IO_DIRECT);
> +		if (offset + size > XFS_I(inode)->i_d.di_size) {
> +			ret = xfs_setfilesize_trans_alloc(ioend);
> +			if (ret)
> +				goto out_destroy_ioend;
> +			ioend->io_isdirect = 1;
> +		}

Ouch.  I am really having trouble parsing that comment.  It looks that
here you are preallocating a transaction for this codepath:

bio_endio
  .bi_end_io
    dio_bio_end_aio
      dio_complete
        .end_io
	  xfs_end_io_direct_write
	    xfs_finish_ioend_sync
	      xfs_end_io	* where io_type != IO_UNWRITTEN
	        xfs_setfilesize

In the situation where we are converting an unwritten extent we cancel
the preallocated transaction and call xfs_iomap_write_unwritten where
the inode core is logged with the updated size.  We were already
allocating an ioend here, so when you said 'To make this possible we
have to preallocate an ioend that allows deferring it here', did you
really mean to say that we're preallocating the transaction?  Maybe
there are just to many 'its' in the comment or I'm just dense.

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs