Re: ext3 writing of data before metadata in ordered mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mulyadi,

Thanks for your opinion. Well if you ask me, JBD and the I/O scheduler
are 2 independent layers, so don't think the ordering of the data and
metadata is done at that level. But there is something about the data
completion handler you're talking about - I think.

Simplistically,
During a write() In data=ordered mode:
1. During updating of metadata (before the data is copied), the kernel
updates the metadata buffers and moves the metadata block to a list in
the active trasaction (which is going to be logged).
2. Then the actually data buffers (memory) are updated with the contents.
3. Then journal_dirty_data is called on each affected data buffer
(this apparently ensures that data is written before the metadata - I
don't know how)
4. And then the block buffers are committed (marked as dirty so that
the page flushing mechanism can send them to disk).
Now steps 3 and 4 seem to be independent therefore I don't know how
step 3 knows when step 4 completes? The only way I can think of is
step 4 sends calls a callback after its done to step 3 somehow?

Let me know if the above analysis makes sense, Thanks.

-Joel

On Sun, Oct 25, 2009 at 9:40 PM, Mulyadi Santosa
<mulyadi.santosa@xxxxxxxxx> wrote:
> Hi Joel...
>
> On Mon, Oct 26, 2009 at 4:33 AM, Joel Fernandes <agnel.joel@xxxxxxxxx> wrote:
>> In data=ordered mode the ext3_ordered_commit_write function marks the
>> buffers as dirty, how then does the JBD ensure that the data is
>> written before the metadata?  Once the data buffers are marked as
>> dirty, JBD doesn't have control anymore over when the data is written
>> is actually written to disk right? Because the actually writing of the
>> data is handled by the page wtriteback mechanism (pdflush) right?
>
> I am not an expert, but here's my thought:
>
> I think writing to backing device is not done simply marking the
> buffer/page cache dirty. So, I think what kernel does is first prepare
> an I/O queue to update ext3 journal. Since we talk about data=ordered
> here, only metadata are logged.
>
> Perhaps the key here is, metadata writing is done as a async
> completion handler of data writing handler. Thus, data is written
> first, followed by metadata logging
>
> Another possibility is composing a single atomic I/O writing request,
> composed of data writing and metadata logging. Thus, I/O scheduler
> won't be able to re-order the request and must complete the sequence
> as we prepared.
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux