Hi Mulyadi, Thanks for your opinion. Well if you ask me, JBD and the I/O scheduler are 2 independent layers, so don't think the ordering of the data and metadata is done at that level. But there is something about the data completion handler you're talking about - I think. Simplistically, During a write() In data=ordered mode: 1. During updating of metadata (before the data is copied), the kernel updates the metadata buffers and moves the metadata block to a list in the active trasaction (which is going to be logged). 2. Then the actually data buffers (memory) are updated with the contents. 3. Then journal_dirty_data is called on each affected data buffer (this apparently ensures that data is written before the metadata - I don't know how) 4. And then the block buffers are committed (marked as dirty so that the page flushing mechanism can send them to disk). Now steps 3 and 4 seem to be independent therefore I don't know how step 3 knows when step 4 completes? The only way I can think of is step 4 sends calls a callback after its done to step 3 somehow? Let me know if the above analysis makes sense, Thanks. -Joel On Sun, Oct 25, 2009 at 9:40 PM, Mulyadi Santosa <mulyadi.santosa@xxxxxxxxx> wrote: > Hi Joel... > > On Mon, Oct 26, 2009 at 4:33 AM, Joel Fernandes <agnel.joel@xxxxxxxxx> wrote: >> In data=ordered mode the ext3_ordered_commit_write function marks the >> buffers as dirty, how then does the JBD ensure that the data is >> written before the metadata? Once the data buffers are marked as >> dirty, JBD doesn't have control anymore over when the data is written >> is actually written to disk right? Because the actually writing of the >> data is handled by the page wtriteback mechanism (pdflush) right? > > I am not an expert, but here's my thought: > > I think writing to backing device is not done simply marking the > buffer/page cache dirty. So, I think what kernel does is first prepare > an I/O queue to update ext3 journal. Since we talk about data=ordered > here, only metadata are logged. > > Perhaps the key here is, metadata writing is done as a async > completion handler of data writing handler. Thus, data is written > first, followed by metadata logging > > Another possibility is composing a single atomic I/O writing request, > composed of data writing and metadata logging. Thus, I/O scheduler > won't be able to re-order the request and must complete the sequence > as we prepared. > > -- > regards, > > Mulyadi Santosa > Freelance Linux trainer and consultant > > blog: the-hydra.blogspot.com > training: mulyaditraining.blogspot.com > -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ