On Thu, Feb 21, 2019 at 12:58:17PM -0500, Brian Foster wrote: > > + /* > > + * For buffered writes we need to report the address of the > > + * previous block (if there was any) so that the higher level > > + * write code can perform read-modify-write operations. For > > + * direct I/O code, which must be block aligned we need to > > + * report the newly allocated address. > > + */ > > + if (!(flags & IOMAP_DIRECT) && > > + orig.br_startblock != HOLESTARTBLOCK) > > + imap = orig; > > I find the logic here kind of confusing. The buffered write (reflink) > path basically expects to allocated COW blocks over an existing shared > extent. It thus makes no modification to the caller's imap because it > (read-modify-)writes into cache and writeback determines where to send > the I/O. Why not follow the same flow here? For example: This is to optimize for the command case. Both in direct I/O being actually common over extent size hints, and also over this being the sensible behavior while the buffered I/O behavior of returning the old map is somewhat odd. I have outstanding todo items to switch extent size hint based buffered I/O to use delalloc reservations, and to clean up how the iomap code currently hacks around the lack of a clear interface for the read-modify-write cycles in buffered I/O, both of which would remove this hack above without touching the surrounding code. > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c > > index 2babc2cbe103..8a5353daf9ab 100644 > > --- a/fs/xfs/xfs_reflink.c > > +++ b/fs/xfs/xfs_reflink.c > > @@ -397,7 +397,8 @@ xfs_reflink_allocate_cow( > > struct xfs_inode *ip, > > struct xfs_bmbt_irec *imap, > > bool *shared, > > - uint *lockmode) > > + uint *lockmode, > > + unsigned iomap_flags) > > I don't see why a lower level reflink mechanism needs to care about > direct I/O or not. IMO this should just be a 'bool convert' param. My memory is a little vague, but I think Darrick preferred it this way.