Hi all, One of our (app) developers noticed that io_submit() takes a very long time to return if the program initiates a write to a block device that's been opened in O_SYNC and O_DIRECTIO mode. We traced the slowness to blkdev_aio_write, which seems to initiate a disk cache flush if __generic_file_aio_write returns a positive value or -EIOCBQUEUED. Usually we see -EIOCBQUEUED returned, which triggers the flush, hence io_submit() stalls for a long time. That doesn't really feel like the intended usage pattern for aio. This -EIOCBQUEUED case seems a little strange -- if an async io has been queued (but not necessarily completed), why would we immediately issue a cache flush? This seems like a setup for the flush racing against the write, which means that the write could happen after the flush, which would be bad. Jeff Moyer proposed a patchset last spring[1] that removed the -EIOCBQUEUED case and deferred the flush issue to each filesystem's end_io handler. Google doesn't find any NAKs, but the patches don't seem to have gone anywhere. Is there a technical reason why this patches haven't gone anywhere? Could one establish an end_io handler in blkdev_direct_IO so that async writes to an O_SYNC+DIO block device will result in a blkdev_issue_flush before aio_complete? That would seem to fix the problem of the write and flush race. --D [1] http://oss.sgi.com/archives/xfs/2012-03/msg00082.html "fs: fix up AIO+DIO+O_SYNC to actually do the sync part" -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html