On Tue, Jan 14, 2014 at 03:30:11PM +0200, Sergey Meirovich wrote: > Hi Cristoph, > > On 8 January 2014 16:03, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Tue, Jan 07, 2014 at 08:37:23PM +0200, Sergey Meirovich wrote: > >> Actually my initial report (14.67Mb/sec 3755.41 Requests/sec) was about ext4 > >> However I have tried XFS as well. It was a bit slower than ext4 on all > >> occasions. > > > > I wasn't trying to say XFS fixes your problem, but that we could > > implement appending AIO writes in XFS fairly easily. > > > > To verify Jan's theory, can you try to preallocate the file to the full > > size and then run the benchmark by doing a: > > > > # fallocate -l <size> <filename> > > > > and then run it? If that's indeed the issue I'd be happy to implement > > the "real aio" append support for you as well. > > > > I've resorted to write simple wrapper around io_submit() and ran it > against preallocated file (exactly to avoid append AIO scenario). > Random data was used to avoid XtremIO online deduplication but results > were still wonderfull for 4k sequential AIO write: > > 744.77 MB/s 190660.17 Req/sec > > Clearly Linux lacks "rial aio" append to be available for any FS. > Seems that you are thinking that it would be relatively easy to > implement it for XFS on Linux? If so - I will really appreciate your > afford. Yes, I think it can be done relatively simply. We'd have to change the code in xfs_file_aio_write_checks() to check whether EOF zeroing was required rather than always taking an exclusive lock (for block aligned IO at EOF sub-block zeroing isn't required), and then we'd have to modify the direct IO code to set the is_async flag appropriately. We'd probably need a new flag to say tell the DIO code that AIO beyond EOF is OK, but that isn't hard to do.... And for those that are wondering about the stale data exposure problem documented in the aio code: /* * For file extending writes updating i_size before data * writeouts complete can expose uninitialized blocks. So * even for AIO, we need to wait for i/o to complete before * returning in this case. */ This is fixed in XFS by removing a single if() check in xfs_iomap_write_direct(). We already use unwritten extents for DIO within EOF to avoid races that could expose uninitialised blocks, so we just need to make that unconditional behaviour. Hence racing IO on concurrent appending i_size updates will only ever see a hole (zeros), an unwritten region (zeros) or the written data. Christoph, are you going to get any time to look at doing this in the next few days? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html