On Fri, Sep 1, 2017 at 12:52 PM, Christoph Hellwig <hch@xxxxxx> wrote: > On Thu, Aug 31, 2017 at 10:20:19PM +0300, Amir Goldstein wrote: >> > IIUC, basically we need to guarantee that a flush submits after >> > file_write_and_wait() and completes before we return. >> >> Yeh. unless we check if file_write_and_wait() submitted anything >> at all. > > Even if file_write_and_wait did not submit anything we need to > make sure a flush was submitted and completed after entering > xfs_file_fsync. For one to deal with the case where we wrote > back data from the flusher threads or the VM, and also for > the direct I/O case. > Right. > > Btw, do you have any results for your simple catch? I wonder > how much of an issue it actually is in practice. Well since the bug was demonstrated using crash simulator, I have no idea how often one would run into this with actual power failure, but does it really matter how often, if we *know* that it can happen? The sequence of bios that I illustrated in commit message is what I saw regardless of the crash simulator, and that sequence leaves a data block in the disk cache, with nothing that guaranties a FLUSH afterwards. So for an unlucky subject, that gives a window of data loss in case of power failure at least until the next time a flusher thread runs. That could be seconds, no? I am working on a variant of the flushseq you suggested using log->l_last_sync_lsn that may be simple enough and not as painful as turning off optimization for WANT_SYNC log buffer. Amir.