This answered all my questions! Many thanks! Will check the phase 2 code. Xin On 4/7/06, Zach Brown <zab@xxxxxxxxx> wrote: > > > If a program access data like this: > > > > 1. open the file > > 2. write a lot of data into this file > > You don't say if this is an extending write or overwriting existing file > data. I'm going to assume extending writes so that data=ordered kicks in. > > > 3. close the file > > > So my questions are: > > 1. How will the file system be notified after all data has been > > flushed into disk? > > Look at phase 2 in journal_commit_transaction(). The kjournald thread > issues the writeback of the file data by walking t_sync_datalist and > then waits for the writeback to complete by using wait_on_buffer() > before committing the transaction. > > > 2. Unlike data=journal mode, in data=order mode, the data could be > > lost if system crashes when data is being flushed to disk. When system > > reboots, does journal contains the old meta data for undo? > > No, ext3 isn't roll-backward. It doesn't store the *old* data in the > journal and undo the change if it fails halfway through. It's > roll-forward. It stores the *new* data in the journal and replays > complete transactions in the journal that weren't moved out to their > final place on disk at the time of the crash. > > So if the machine reboots during the writeback phase then the > transaction won't be committed yet and recovery won't replay that > transaction from the journal. From the metadata's point of view the > file extension will never have happened. > > > 3. Does sys_close() have to be blocked until all data and metadata > > are committed? > > No, and neither does sys_getpid() :) > > > to take subsequent operation. However, data flush could be failed. In > > this case, file system seems to mislead the application. Is this true? > > No. The application has no grounds for assuming that a successful > close() has synced previous operations to disk. It's simply not part of > the API. > > > If so, any solutions? > > The application should rely on tools like fsync(), fdatasync(), O_SYNC, > mount -o sync, etc. Whatever suits it best. > > - z > - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html