Thanks for your continued effort Dave. In the current implementation the first write to the location updates the metadata and must issue the flush. In Windows SQL Server can avoid this behavior. SQL Server can issue DeviceIoControl with SET_FILE_VALID_DATA and then SetEndOfFile. The SetEndOfFile acquires space and saves metadata without requiring the actual write. This allows us to quickly create a large file and the writes do not need the added flush. Is this something that fallocate could accommodate to avoid having to write once (triggers flush for metadata) and then secondary writes can use FUA and avoid the flush? -----Original Message----- From: Dave Chinner <david@xxxxxxxxxxxxx> Sent: Tuesday, May 1, 2018 9:46 PM To: Jan Kara <jack@xxxxxxx> Cc: linux-xfs@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; linux-block@xxxxxxxxxxxxxxx; hch@xxxxxx; Robert Dorr <rdorr@xxxxxxxxxxxxx> Subject: Re: [PATCH 2/4] iomap: iomap_dio_rw() handles all sync writes On Sat, Apr 21, 2018 at 03:03:09PM +0200, Jan Kara wrote: > On Wed 18-04-18 14:08:26, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Currently iomap_dio_rw() only handles (data)sync write completions > > for AIO. This means we can't optimised non-AIO IO to minimise device > > flushes as we can't tell the caller whether a flush is required or > > not. > > > > To solve this problem and enable further optimisations, make > > iomap_dio_rw responsible for data sync behaviour for all IO, not > > just AIO. > > > > In doing so, the sync operation is now accounted as part of the DIO > > IO by inode_dio_end(), hence post-IO data stability updates will no > > long race against operations that serialise via inode_dio_wait() > > such as truncate or hole punch. > > > > Signed-Off-By: Dave Chinner <dchinner@xxxxxxxxxx> > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > Looks good to me. You can add: > > Reviewed-by: Jan Kara <jack@xxxxxxx> It looks good, but it's broken in a subtle, nasty way. :/ > > @@ -768,14 +776,8 @@ static ssize_t iomap_dio_complete(struct > > iomap_dio *dio) static void iomap_dio_complete_work(struct > > work_struct *work) { > > struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work); > > - struct kiocb *iocb = dio->iocb; > > - bool is_write = (dio->flags & IOMAP_DIO_WRITE); > > - ssize_t ret; > > > > - ret = iomap_dio_complete(dio); > > - if (is_write && ret > 0) > > - ret = generic_write_sync(iocb, ret); > > - iocb->ki_complete(iocb, ret, 0); > > + dio->iocb->ki_complete(dio->iocb, iomap_dio_complete(dio), 0); This generates a use after free from KASAN from generic/016. it appears the compiler orders the code so that it dereferences dio->iocb after iomap_dio_complete() has freed the dio structure (yay!). I'll post a new version of the patchset now that I've got changes to 2 of the 3 remaining patches in it.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx