On Tue, Apr 14, 2015 at 10:35:19AM -0400, Brian Foster wrote: > On Tue, Apr 14, 2015 at 05:26:48PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > DIO writes that lie entirely within EOF have nothing to do in IO > > completion. In this case, we don't need no steekin' ioend, and so we > > can avoid allocating an ioend until we have a mapping that spans > > EOF. > > > > This means that IO completion has two contexts - deferred completion > > to the dio workqueue that uses an ioend, and interrupt completion > > that does nothing because there is nothing that can be done in this > > context. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/xfs_aops.c | 62 ++++++++++++++++++++++++++++++------------------------ > > fs/xfs/xfs_trace.h | 1 + > > 2 files changed, 36 insertions(+), 27 deletions(-) > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > index e3968a3..55356f6 100644 > > --- a/fs/xfs/xfs_aops.c > > +++ b/fs/xfs/xfs_aops.c > > @@ -1234,15 +1234,19 @@ xfs_vm_releasepage( > > } > > > > /* > > - * When we map a DIO buffer, we need to attach an ioend that describes the type > > + * When we map a DIO buffer, we may need to attach an ioend that describes the type > > * of write IO we are doing. This passes to the completion function the > > - * operations it needs to perform. > > + * operations it needs to perform. If the mapping is for an overwrite wholly > > + * within the EOF then we don't need an ioend and so we don't allocate one. This > > + * avoids the unnecessary overhead of allocating and freeing ioends for > > + * workloads that don't require transactions on IO completion. > > * > > * If we get multiple mappings to in a single IO, we might be mapping dfferent > > * types. But because the direct IO can only have a single private pointer, we > > * need to ensure that: > > * > > - * a) the ioend spans the entire region of the IO; and > > + * a) i) the ioend spans the entire region of unwritten mappings; or > > + * ii) the ioend spans all the mappings that cross or are beyond EOF; and > > * b) if it contains unwritten extents, it is *permanently* marked as such > > * > > * We could do this by chaining ioends like buffered IO does, but we only > > @@ -1283,7 +1287,8 @@ xfs_map_direct( > > trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset, > > ioend->io_size, ioend->io_type, > > imap); > > - } else { > > + } else if (type == XFS_IO_UNWRITTEN || > > + offset + size > i_size_read(inode)) { > > ioend = xfs_alloc_ioend(inode, type); > > ioend->io_offset = offset; > > ioend->io_size = size; > > @@ -1291,10 +1296,13 @@ xfs_map_direct( > > > > trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type, > > imap); > > + } else { > > + trace_xfs_gbmap_direct_none(XFS_I(inode), offset, size, type, > > + imap); > > Do we really need a tracepoint to indicate none of the other tracepoints > were hit? It stands out to me only because we already have the > unconditional trace_xfs_gbmap_direct() above. I'd say kill one or the > other, but I think we really want the function entry one because it > disambiguates individual get_block instances from the aggregate mapping. I found this incredibly useful in debugging this code, because it told me exactly what each mapping call was doing, and from that I could see if it was doing the right thing. Yes, i could infer it from the entry trace point, but grepping on the entry tracepoint gets *all* the mapping calls, not just the overwrites wholly within EOF... > > + return; > > } > > > > - if (ioend->io_type == XFS_IO_UNWRITTEN || xfs_ioend_is_append(ioend)) > > - set_buffer_defer_completion(bh_result); > > + set_buffer_defer_completion(bh_result); > > I'd move this up into the block where we allocate an ioend. That's the > only place we need it and doing so eliminates the need for the 'else { > return; }' thing entirely. Yeah, that would work, too. > > STATIC void > > xfs_end_io_direct_write( > > @@ -1531,7 +1541,10 @@ xfs_end_io_direct_write( > > struct xfs_mount *mp = ip->i_mount; > > struct xfs_ioend *ioend = private; > > > > - trace_xfs_gbmap_direct_endio(ip, offset, size, ioend->io_type, NULL); > > + trace_xfs_gbmap_direct_endio(ip, offset, size, > > + ioend ? ioend->io_type : 0, NULL); > > + if (!ioend) > > + return; > > Can we keep the i_size assert we've lost below? > > ASSERT(offset + size <= i_size_read(inode)); Sure, I can add it for that case. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs