Re: [PATCH] Make non-journal fsync work properly.

Theodore Tso <tytso@xxxxxxx> · Tue, 8 Sep 2009 01:06:14 -0400

On Fri, Sep 04, 2009 at 07:55:00PM -0700, Frank Mayhar wrote:
> Teach ext4_write_inode() and ext4_do_update_inode() about non-journal
> mode:  If we're not using a journal, ext4_write_inode() now calls
> ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc())
> with a new "do_sync" parameter.  If that parameter is nonzero
> ext4_do_update_inode() calls sync_dirty_buffer() instead of
> ext4_handle_dirty_metadata().

Hi Frank,

The problem with this patch is that it's only safe to call
sync_dirty_buffer() if we are not journalling.  If we are using the
journal, we must *not* call sync_dirty_buffer(), but instead must use
jbd2_journal_dirty_metadata().

The problem is that there are paths where ext4_do_update_inode() can
get called with do_sync==1, even when journalling is enabled.
Specifically, if ext4_write_inode() is called with wait==1, wait is
passed to ext4_do_update_inode() as do_sync, and then when a journal
is present, we will end up calling sync_dirty_buffer(), which means we
will be writing out the modified metadata *before* the transaction has
committed.

If you try using your patch with journalling enabled, and you try
doing some power fail testing, my code inspection leads me to believe
with 99% certainty that the filesystem will be corrupted as a result.

I think what you need to do instead is to add an extra parameter
do_sync to ext4_handle_dirty_metadata(), and continue to call
ext4_handle_dirty_metadata.  However in code paths where we will later
force a commit to guarantee that the metadata has been written out
(i.e., in the fsync() code path), ext4_handle_dirty_metadata() should
be called with the new do_sync parameter set to 1.

Does that make sense?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html