On Tue, Nov 29, 2022 at 11:20:05AM -0800, Shawn wrote: > Hello all, > I implemented a write workload by sequentially appending to the file > end using libaio aio_write in O_DIRECT mode (with proper offset and > buffer address alignment). When I reach a 1MB boundary I call > fallocate() to extend the file. > > I need to protect the write from various failures such as disk unplug > / power failure. The bottom line is, once I ack a write-complete, > the user must be able to read it back later after a disk/power failure > and recovery. > > In my understanding, fallocate() will preallocate disk space for the > file, and I can call fsync to make sure the file metadata about this > new space is persisted when fallocate returns. Once aio_write returns > the data is in the disk. So it seems I don't need fsync after > aio-write completion, because (1) the data is in disk, and (2) the > file metadata to address the disk blocks is in disk. > > On the other hand, it seems XFS always does a delayed allocation > which might break my assumption that file=>disk space mapping is > persisted by fallocate. > > I can improve the data-in-disk format to carry proper header/footer to > detect a broken write when scanning the file after a disk/power > failure. > > Given all those above, do I still need a fsync() after aio_write > completion in XFS to protect data persistence? Yes. The only time you don't is if you're performing an O_SYNC write to a part of a file that you've already written (and fsync'd) that's entirely below EOF and you've arranged that the filesystem will never COW or otherwise require metadata updates. Hey, at least aio_fsync works now... --D > Thanks all for your input! > > regards, > Shawn