On Wed, Sep 20, 2017 at 07:05:04PM +0800, Eryu Guan wrote: > On Tue, Sep 19, 2017 at 07:34:06AM -0700, Christoph Hellwig wrote: > > On Tue, Sep 19, 2017 at 10:13:52AM -0400, Brian Foster wrote: > > > Can we pass a boolean or flag to xfs_iomap_write_unwritten() to have it > > > update the incore i_size after unwritten extent conversion? Then move > > > (or remove) the associated update from xfs_dio_write_end_io(). > > > > I don't think we even need a flag - all three callers of > > xfs_iomap_write_unwritten want to update the file size. > > I tried this approach, but seems there's some problem in the buffered > aio path, generic/112 (aio fsx) failed quickly. But I haven't digged > into the reason (maybe I screwed it up, not the method is wrong..). > >From the generic/112 results: Size error: expected 0x3785b stat 0x38000 seek 0x38000 I suspect the problem is that the offset+size from buffered I/O completion is not based on the inode size. Rather, it is buffer head granularity size of the ioend. Given that, it probably does make sense to skip the update from this path. > Then I tried Brian's suggestion, pass a boolean to > xfs_iomap_write_unwritten() to tell if we want it to update in-core > isize after unwritten extent conversion, and skip the in-core isize > update in xfs_dio_write_end_io() accordingly. This approach seems to > work, it passed the test Eric posted here, and fstests 'aio' group > tests, a run of 'quick' group didn't find any new failure as well. > > I attached the WIP patch (without proper comments) I was testing, if > this looks fine I can format a formal patch and do more testings. > Thanks. This mostly looks reasonable to me... > Thanks, > Eryu > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 29172609f2a3..288da47e9ac5 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -343,7 +343,7 @@ xfs_end_io( > error = xfs_reflink_end_cow(ip, offset, size); > break; > case XFS_IO_UNWRITTEN: > - error = xfs_iomap_write_unwritten(ip, offset, size); > + error = xfs_iomap_write_unwritten(ip, offset, size, false); Maybe add a single line comment here wrt to the above (why we don't update isize). > break; > default: > ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans); > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 350b6d43ba23..f3ad024573e7 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -435,6 +435,7 @@ xfs_dio_write_end_io( > struct xfs_inode *ip = XFS_I(inode); > loff_t offset = iocb->ki_pos; > bool update_size = false; > + bool write_unwritten = (flags & IOMAP_DIO_UNWRITTEN); > int error = 0; > > trace_xfs_end_io_direct_write(ip, offset, size); > @@ -458,7 +459,8 @@ xfs_dio_write_end_io( > */ > spin_lock(&ip->i_flags_lock); > if (offset + size > i_size_read(inode)) { > - i_size_write(inode, offset + size); > + if (!write_unwritten) > + i_size_write(inode, offset + size); > update_size = true; I find the logic a little confusing here. For !write_unwritten, update_size means to update the on-disk size. Otherwise, it instructs iomap_write_unwritten() to also update the in-core size. The latter also checks for appends, however, so it might as well always be true from here. It seems that for such a small function, we should be able to make this a bit easier to follow. ;P Should we ever see an isize update at all on a IOMAP_DIO_COW completion (wouldn't reflinked blocks have to be within eof)? If not, then we can presumably rule out isize updates in that case. I think that just leaves the case where a dio write occurs on a pre-existing block. Hmm, could we just move this whole hunk down to after the iomap_write_unwritten() call and eliminate the need for the flag entirely? E.g., something like: ... if (IOMAP_DIO_COW) { ... } /* unwritten conversion updates isize */ if (IOMAP_DIO_UNWRITTEN) return xfs_iomap_write_unwritten(ip, offset, size, true); if (offset + size > i_size_read(inode)) { i_size_write(inode, offset + size); error = xfs_setfilesize(...); } return error; > } > spin_unlock(&ip->i_flags_lock); > @@ -469,8 +471,8 @@ xfs_dio_write_end_io( > return error; > } > > - if (flags & IOMAP_DIO_UNWRITTEN) > - error = xfs_iomap_write_unwritten(ip, offset, size); > + if (write_unwritten) > + error = xfs_iomap_write_unwritten(ip, offset, size, update_size); > else if (update_size) > error = xfs_setfilesize(ip, offset, size); > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index a1909bc064e9..0a088586371e 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -829,7 +829,8 @@ int > xfs_iomap_write_unwritten( > xfs_inode_t *ip, > xfs_off_t offset, > - xfs_off_t count) > + xfs_off_t count, > + bool update_size) > { > xfs_mount_t *mp = ip->i_mount; > xfs_fileoff_t offset_fsb; > @@ -840,6 +841,7 @@ xfs_iomap_write_unwritten( > xfs_trans_t *tp; > xfs_bmbt_irec_t imap; > struct xfs_defer_ops dfops; > + struct inode *inode = VFS_I(ip); > xfs_fsize_t i_size; > uint resblks; > int error; > @@ -900,6 +902,13 @@ xfs_iomap_write_unwritten( > if (i_size > offset + count) > i_size = offset + count; > > + if (update_size) { > + spin_lock(&ip->i_flags_lock); > + if (i_size > i_size_read(inode)) > + i_size_write(inode, i_size); > + spin_unlock(&ip->i_flags_lock); We have XFS_ILOCK_EXCL here so I don't think the spinlocks are necessary any longer. That means this could probably be condensed to something like if (update_size && i_size > i_size_read(inode)) i_size_write(inode, i_size) > + } > + > i_size = xfs_new_eof(ip, i_size); > if (i_size) { > ip->i_d.di_size = i_size; > diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h > index 00db3ecea084..ee535065c5d0 100644 > --- a/fs/xfs/xfs_iomap.h > +++ b/fs/xfs/xfs_iomap.h > @@ -27,7 +27,7 @@ int xfs_iomap_write_direct(struct xfs_inode *, xfs_off_t, size_t, > struct xfs_bmbt_irec *, int); > int xfs_iomap_write_allocate(struct xfs_inode *, int, xfs_off_t, > struct xfs_bmbt_irec *); > -int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t); > +int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool); > > void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *, > struct xfs_bmbt_irec *); > diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c > index 2f2dc3c09ad0..4246876df7b7 100644 > --- a/fs/xfs/xfs_pnfs.c > +++ b/fs/xfs/xfs_pnfs.c > @@ -274,7 +274,7 @@ xfs_fs_commit_blocks( > (end - 1) >> PAGE_SHIFT); > WARN_ON_ONCE(error); > > - error = xfs_iomap_write_unwritten(ip, start, length); > + error = xfs_iomap_write_unwritten(ip, start, length, false); Note that this path does update isize. It runs another transaction to get around the fact that write_unwritten() wouldn't log a new on disk size. There is some validation thing here though, so we might need to check whether it is Ok to run that earlier and whether it should continue to return an error on failure. Christoph? That said, maybe a follow on patch would be better since this one might be stable fodder. Brian > if (error) > goto out_drop_iolock; > } > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html