On Tue, Apr 14, 2015 at 10:35:02AM -0400, Brian Foster wrote: > On Tue, Apr 14, 2015 at 05:26:47PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Currently a DIO overwrite that extends the EOF (e.g sub-block IO or > > write into allocated blocks beyond EOF) requires a transaction for > > the EOF update. Thi is done in IO completion context, but we aren't > > explicitly handling this situation properly and so it can run in > > interrupt context. Ensure that we defer IO that spans EOF correctly > > to the DIO completion workqueue, and now that we have an ioend in IO > > completion we can use the common ioend completion path to do all the > > work. > > > > Note: we do not preallocate the append transaction as we can have > > multiple mapping and allocation calls per direct IO. hence > > preallocating can still leave us with nested transactions by > > attempting to map and allocate more blocks after we've preallocated > > an append transaction. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/xfs_aops.c | 61 +++++++++++++++++++++++++++--------------------------- > > fs/xfs/xfs_trace.h | 1 + > > 2 files changed, 31 insertions(+), 31 deletions(-) > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > index e1fa926..e3968a3 100644 > > --- a/fs/xfs/xfs_aops.c > > +++ b/fs/xfs/xfs_aops.c > > @@ -1293,7 +1293,7 @@ xfs_map_direct( > > imap); > > } > > > > - if (ioend->io_type == XFS_IO_UNWRITTEN) > > + if (ioend->io_type == XFS_IO_UNWRITTEN || xfs_ioend_is_append(ioend)) > > set_buffer_defer_completion(bh_result); > > } > > > > @@ -1531,8 +1531,10 @@ xfs_end_io_direct_write( > > struct xfs_mount *mp = ip->i_mount; > > struct xfs_ioend *ioend = private; > > > > + trace_xfs_gbmap_direct_endio(ip, offset, size, ioend->io_type, NULL); > > + > > if (XFS_FORCED_SHUTDOWN(mp)) > > - goto out_destroy_ioend; > > + goto out_end_io; > > > > /* > > * dio completion end_io functions are only called on writes if more > > @@ -1553,40 +1555,37 @@ xfs_end_io_direct_write( > > ioend->io_offset = offset; > > > > /* > > - * While the generic direct I/O code updates the inode size, it does > > - * so only after the end_io handler is called, which means our > > - * end_io handler thinks the on-disk size is outside the in-core > > - * size. To prevent this just update it a little bit earlier here. > > + * The ioend tells us whether we are doing unwritten extent conversion > > + * or an append transaction that updates the on-disk file size. These > > + * cases are the only cases where we should *potentially* be needing > > + * to update the VFS inode size. When the ioend indicates this, we > > + * are *guaranteed* to be running in non-interrupt context. > > + * > > + * We need to update the in-core inode size here so that we don't end up > > + * with the on-disk inode size being outside the in-core inode size. > > + * While we can do this in the process context after the IO has > > + * completed, this does not work for AIO and hence we always update > > + * the in-core inode size here if necessary. > > */ > > - if (offset + size > i_size_read(inode)) > > - i_size_write(inode, offset + size); > > + if (ioend->io_type == XFS_IO_UNWRITTEN || xfs_ioend_is_append(ioend)) { > > + if (offset + size > i_size_read(inode)) > > + i_size_write(inode, offset + size); > > + } else > > + ASSERT(offset + size <= i_size_read(inode)); > > The code was obviously incorrect prior to this change, potentially > running some of these transactions in irq context. That said, it occurs > to me that one thing that the previous implementation looked to handle > that this does not is racing of in-flight aio with other operations. > E.g., what happens now if a non-extending, overwrite aio is submitted > and races with a truncate that causes it to be extending by the time we > get here? It looks like it would have been racy regardless, so maybe > that's just a separate problem... > Looking further, we actually wait on dio in the truncate path with IOLOCK_EXCL (e.g., similar to what we now do for extending aio itself), so this is probably irrelevant... Brian > > > > /* > > - * For direct I/O we do not know if we need to allocate blocks or not, > > - * so we can't preallocate an append transaction, as that results in > > - * nested reservations and log space deadlocks. Hence allocate the > > - * transaction here. While this is sub-optimal and can block IO > > - * completion for some time, we're stuck with doing it this way until > > - * we can pass the ioend to the direct IO allocation callbacks and > > - * avoid nesting that way. > > + * If we are doing an append IO that needs to update the EOF on disk, > > + * do the transaction reserve now so we can use common end io > > + * processing. Stashing the error (if there is one) in the ioend will > > + * result in the ioend processing passing on the error if it is > > + * possible as we can't return it from here. > > */ > > - if (ioend->io_type == XFS_IO_UNWRITTEN) { > > - xfs_iomap_write_unwritten(ip, offset, size); > > - } else if (offset + size > ip->i_d.di_size) { > > - struct xfs_trans *tp; > > - int error; > > - > > - tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS); > > - error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0); > > - if (error) { > > - xfs_trans_cancel(tp, 0); > > - goto out_destroy_ioend; > > - } > > + if (ioend->io_type == XFS_IO_OVERWRITE && xfs_ioend_is_append(ioend)) > > + ioend->io_error = xfs_setfilesize_trans_alloc(ioend); > > As you mentioned previously, we no longer need the transaction context > manipulation stuff in xfs_setfilesize_trans_alloc() with this approach. > It's still called from the writepage path though, so I guess it needs to > stay. > > Brian > > > > > - xfs_setfilesize(ip, tp, offset, size); > > - } > > -out_destroy_ioend: > > - xfs_destroy_ioend(ioend); > > +out_end_io: > > + xfs_end_io(&ioend->io_work); > > + return; > > } > > > > STATIC ssize_t > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > > index e78b64e..967993b 100644 > > --- a/fs/xfs/xfs_trace.h > > +++ b/fs/xfs/xfs_trace.h > > @@ -1224,6 +1224,7 @@ DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc); > > DEFINE_IOMAP_EVENT(xfs_gbmap_direct); > > DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new); > > DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update); > > +DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio); > > > > DECLARE_EVENT_CLASS(xfs_simple_io_class, > > TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count), > > -- > > 2.0.0 > > > > _______________________________________________ > > xfs mailing list > > xfs@xxxxxxxxxxx > > http://oss.sgi.com/mailman/listinfo/xfs > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs