Bryan Henderson wrote: > > For database writes, you typically write a bunch of stuff in various > > regions of a big file (or multiple files), then ideally fdatasync > > some/all of the written ranges - with writes committed to disk in the > > best order determined by the OS and I/O scheduler. > > > > For this, taking a vector of multiple ranges would be nice. > > Alternatively, issuing parallel fsync_range calls from multiple > > threads would approximate the same thing - if (big if) they aren't > > serialised by the kernel. > > That sounds like a job for fadvise(). A new FADV_WILLSYNC says you're > planning to sync that data soon. The kernel responds by scheduling the > I/O immediately. fsync_range() takes a single range and in this case is > just a wait. I think it would be easier for the user as well as more > flexible for the kernel than a multi-range fsync_range() or multiple > threads. FADV_WILLSYNC is already implemented: sync_file_range() with SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE. That will block in a few circumstances, but maybe that's inevitable. If you called FADV_WILLSYNC on a few ranges to mean "soon", how do you wait until those ranges are properly committed? How do you ensure the right low-level I/O barriers are sent for those ranges before you start writing post-barrier data? I think you're saying call FADV_WILLSYNC first on all the ranges, then call fsync_range() on each range in turn to wait for the I/O to be complete - although that will cause unnecessary I/O barriers, one per fsync_range(). You can do something like that with sync_file_range() at the moment, except no way to ask for the barrier. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html