Re: [PATCH 2/2] ext4: optimize ext4_force_commit

Dmitry Monakhov <dmonakhov@xxxxxxxxxx> · Tue, 09 Apr 2013 18:06:58 +0400

On Tue, 9 Apr 2013 17:42:33 +0400, Dmitry Monakhov <dmonakhovopenvz@xxxxxxxxx> wrote:
Non-text part: multipart/alternative
> ---------- Forwarded message ----------
> From: "Theodore Ts'o" <tytso@xxxxxxx>
> Date: Apr 9, 2013 5:31 PM
> Subject: Re: [PATCH 2/2] ext4: optimize ext4_force_commit
> To: "Dmitry Monakhov" <dmonakhovopenvz@xxxxxxxxx>
> Cc:
> 
> On Mon, Apr 01, 2013 at 11:10:59PM +0400, Dmitry Monakhov wrote:
> > Yes. I think so too.  Even more jbd2_force_commit_nested already does what
> > we want because we are not hold transaction. I'll send patch tomorrow
> 
> Hi Dmitry,
> 
> Did you send an update for this patch?  I can't seem to find it.  I'm
> wondering if we perhaps got distracted by the big endian patch, and we
> didn't get back to this commit.
Ohh It is appeared that it requires more deep analysis.
For example sync(2) is implicitly broken because ext4_sync_fs() may not
send a flush barrier. So following case is possible:

dd if=/dev/zero of=file oflag=direct bs=1M count=1
sync
POWER_FAILURE-> result in data lost. Because:

sys_write()
-> __generic_file_aio_write
   ->file_update_time
    ->update_time
      -> touch metadata -> ext4_update_inode_fsync_trans

  <# A lot of journal_start/journal_stop# >
 ->submit_bio  

 ->dio_complete

sys_sync()
 -> flush_inodes (may not start a journal)
 -> ext4_sync_fs
    if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
                if (wait)
                        jbd2_log_wait_commit(sbi->s_journal, target);

   But no one guarantee us that we start any transaction since dio was
   completed so barrier will not be send. This means we may lose
   our data on power failure.
At this moment I try to figure up sane data integrity model for
fsync/syncfs. Hope I'll send you patches at the end of the week.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html