On Wed, Sep 20, 2017 at 08:55:30AM -0400, Brian Foster wrote: > On Wed, Sep 20, 2017 at 07:05:04PM +0800, Eryu Guan wrote: > > On Tue, Sep 19, 2017 at 07:34:06AM -0700, Christoph Hellwig wrote: > > > On Tue, Sep 19, 2017 at 10:13:52AM -0400, Brian Foster wrote: > > > > Can we pass a boolean or flag to xfs_iomap_write_unwritten() to have it > > > > update the incore i_size after unwritten extent conversion? Then move > > > > (or remove) the associated update from xfs_dio_write_end_io(). > > > > > > I don't think we even need a flag - all three callers of > > > xfs_iomap_write_unwritten want to update the file size. > > > > I tried this approach, but seems there's some problem in the buffered > > aio path, generic/112 (aio fsx) failed quickly. But I haven't digged > > into the reason (maybe I screwed it up, not the method is wrong..). > > > > From the generic/112 results: > > Size error: expected 0x3785b stat 0x38000 seek 0x38000 > > I suspect the problem is that the offset+size from buffered I/O > completion is not based on the inode size. Rather, it is buffer head > granularity size of the ioend. Given that, it probably does make sense > to skip the update from this path. Yeah, you're right. xfs_io -fc "falloc -k 0 4k" -c "pwrite 0 2k" -c fsync /mnt/xfs/testfile This results in 4k file size (block size is also 4k). > > > Then I tried Brian's suggestion, pass a boolean to > > xfs_iomap_write_unwritten() to tell if we want it to update in-core > > isize after unwritten extent conversion, and skip the in-core isize > > update in xfs_dio_write_end_io() accordingly. This approach seems to > > work, it passed the test Eric posted here, and fstests 'aio' group > > tests, a run of 'quick' group didn't find any new failure as well. > > > > I attached the WIP patch (without proper comments) I was testing, if > > this looks fine I can format a formal patch and do more testings. > > > > Thanks. This mostly looks reasonable to me... > > > Thanks, > > Eryu > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > index 29172609f2a3..288da47e9ac5 100644 > > --- a/fs/xfs/xfs_aops.c > > +++ b/fs/xfs/xfs_aops.c > > @@ -343,7 +343,7 @@ xfs_end_io( > > error = xfs_reflink_end_cow(ip, offset, size); > > break; > > case XFS_IO_UNWRITTEN: > > - error = xfs_iomap_write_unwritten(ip, offset, size); > > + error = xfs_iomap_write_unwritten(ip, offset, size, false); > > Maybe add a single line comment here wrt to the above (why we don't > update isize). Will do. > > > break; > > default: > > ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans); > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index 350b6d43ba23..f3ad024573e7 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -435,6 +435,7 @@ xfs_dio_write_end_io( > > struct xfs_inode *ip = XFS_I(inode); > > loff_t offset = iocb->ki_pos; > > bool update_size = false; > > + bool write_unwritten = (flags & IOMAP_DIO_UNWRITTEN); > > int error = 0; > > > > trace_xfs_end_io_direct_write(ip, offset, size); > > @@ -458,7 +459,8 @@ xfs_dio_write_end_io( > > */ > > spin_lock(&ip->i_flags_lock); > > if (offset + size > i_size_read(inode)) { > > - i_size_write(inode, offset + size); > > + if (!write_unwritten) > > + i_size_write(inode, offset + size); > > update_size = true; > > I find the logic a little confusing here. For !write_unwritten, > update_size means to update the on-disk size. Otherwise, it instructs > iomap_write_unwritten() to also update the in-core size. The latter also > checks for appends, however, so it might as well always be true from > here. It seems that for such a small function, we should be able to make > this a bit easier to follow. ;P Agreed, it looks confusing when update_size serves as indicators of both in-core and on-disk size update. And we can pass 'true' unconditionally to xfs_iomap_write_unwritten() in this case. > > Should we ever see an isize update at all on a IOMAP_DIO_COW completion > (wouldn't reflinked blocks have to be within eof)? If not, then we can > presumably rule out isize updates in that case. I think that just leaves > the case where a dio write occurs on a pre-existing block. Hmm, could we > just move this whole hunk down to after the iomap_write_unwritten() call > and eliminate the need for the flag entirely? E.g., something like: > > ... > if (IOMAP_DIO_COW) { > ... > } > > /* unwritten conversion updates isize */ > if (IOMAP_DIO_UNWRITTEN) > return xfs_iomap_write_unwritten(ip, offset, size, true); > > if (offset + size > i_size_read(inode)) { > i_size_write(inode, offset + size); > error = xfs_setfilesize(...); > } > > return error; This seems fine, tests in 'clone' group all passed without new failures, I'll update patch as you suggested. I thought about something like this too, but I'm not familiar with the CoW path, so I decided to keep the original logic as much as possible. Thanks for your review and suggestions! Eryu > > > } > > spin_unlock(&ip->i_flags_lock); > > @@ -469,8 +471,8 @@ xfs_dio_write_end_io( > > return error; > > } > > > > - if (flags & IOMAP_DIO_UNWRITTEN) > > - error = xfs_iomap_write_unwritten(ip, offset, size); > > + if (write_unwritten) > > + error = xfs_iomap_write_unwritten(ip, offset, size, update_size); > > else if (update_size) > > error = xfs_setfilesize(ip, offset, size); > > > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > > index a1909bc064e9..0a088586371e 100644 > > --- a/fs/xfs/xfs_iomap.c > > +++ b/fs/xfs/xfs_iomap.c > > @@ -829,7 +829,8 @@ int > > xfs_iomap_write_unwritten( > > xfs_inode_t *ip, > > xfs_off_t offset, > > - xfs_off_t count) > > + xfs_off_t count, > > + bool update_size) > > { > > xfs_mount_t *mp = ip->i_mount; > > xfs_fileoff_t offset_fsb; > > @@ -840,6 +841,7 @@ xfs_iomap_write_unwritten( > > xfs_trans_t *tp; > > xfs_bmbt_irec_t imap; > > struct xfs_defer_ops dfops; > > + struct inode *inode = VFS_I(ip); > > xfs_fsize_t i_size; > > uint resblks; > > int error; > > @@ -900,6 +902,13 @@ xfs_iomap_write_unwritten( > > if (i_size > offset + count) > > i_size = offset + count; > > > > + if (update_size) { > > + spin_lock(&ip->i_flags_lock); > > + if (i_size > i_size_read(inode)) > > + i_size_write(inode, i_size); > > + spin_unlock(&ip->i_flags_lock); > > We have XFS_ILOCK_EXCL here so I don't think the spinlocks are > necessary any longer. That means this could probably be condensed to > something like > > if (update_size && i_size > i_size_read(inode)) > i_size_write(inode, i_size) > > > + } > > + > > i_size = xfs_new_eof(ip, i_size); > > if (i_size) { > > ip->i_d.di_size = i_size; > > diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h > > index 00db3ecea084..ee535065c5d0 100644 > > --- a/fs/xfs/xfs_iomap.h > > +++ b/fs/xfs/xfs_iomap.h > > @@ -27,7 +27,7 @@ int xfs_iomap_write_direct(struct xfs_inode *, xfs_off_t, size_t, > > struct xfs_bmbt_irec *, int); > > int xfs_iomap_write_allocate(struct xfs_inode *, int, xfs_off_t, > > struct xfs_bmbt_irec *); > > -int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t); > > +int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool); > > > > void xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *, > > struct xfs_bmbt_irec *); > > diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c > > index 2f2dc3c09ad0..4246876df7b7 100644 > > --- a/fs/xfs/xfs_pnfs.c > > +++ b/fs/xfs/xfs_pnfs.c > > @@ -274,7 +274,7 @@ xfs_fs_commit_blocks( > > (end - 1) >> PAGE_SHIFT); > > WARN_ON_ONCE(error); > > > > - error = xfs_iomap_write_unwritten(ip, start, length); > > + error = xfs_iomap_write_unwritten(ip, start, length, false); > > Note that this path does update isize. It runs another transaction to > get around the fact that write_unwritten() wouldn't log a new on disk > size. There is some validation thing here though, so we might need to > check whether it is Ok to run that earlier and whether it should > continue to return an error on failure. Christoph? > > That said, maybe a follow on patch would be better since this one might > be stable fodder. > > Brian > > > if (error) > > goto out_drop_iolock; > > } > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html