Hello! On Dec 15, 2009, at 11:45 AM, tytso@xxxxxxx wrote: >>> + /* >>> + * If the journal is not located on the file system device, >>> + * then we must flush the file system device before we issue >>> + * the commit record >>> + */ >>> + if (commit_transaction->t_flushed_data_blocks && >>> + (journal->j_fs_dev != journal->j_dev) && >>> + (journal->j_flags & JBD2_BARRIER)) >>> + blkdev_issue_flush(journal->j_fs_dev, NULL); >>> + >> I am afraid this is not enough. This code is called after journal >> was flushed for async commit case, so it leaves a race window where >> journal transaction is already on disk and complete, but the data is >> still in cache somewhere. > No, that's actually fine. In the ASYNC_COMMIT case, the commit won't > be valid until the checksum is correct, and we won't have written any > descriptor blocks yet at this point. So there is no race because What do you mean by descriptor blocks? Actual logged blocks? > during that window, the commit is written but we won't write any > descriptor blocks until after the barrier returns. >From my reading of the code, in jbd2_journal_commit_transaction: After first "JBD: commit phase 2" message we submit actual data buffers. Then after second "JBD: commit phase 2" message (in the "while (commit_transaction->t_buffers) {" loop) we actually iterate through all metadata changes and submit them to disk (after start_journal_io: label) Then journal_submit_commit_record writes commit record. At this point on we have entire transacton + commit block in the write queue (or partially already on disk) + all the data blocks in the write queue. Next blkdev_issue_flush on journal device is done, cementing the transaction on the permanent storage, but the actual data is still potentially not there abd it is not until the call to journal_finish_inode_data_buffers() where that data is flushed too. If you refer to the code after "JBD: commit phase 3" that waits on actual transaction blocks buffers to make sure they made it to the disk, my reading of blkdev_issue_flush is that it adds the flush at the end of the queue, i.e. after all of the journalled blocks that are sitting in there, so they all would be already copleted by the time blkdev_issue_flush returns. Bye, Oleg-- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html