Re: [PATCH 1/2] ext4/jbd2: fix io-barrier logic in case of external journal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Mon, Mar 22, 2010 at 07:14:13PM +0300, Dmitry Monakhov wrote:
> > >
> > > Just to be clear, since I realized I wrote fsync() when I should have
> > > written fs/ext4/fsync.c, my proposal was to put this check in
> > > ext4_sync_file().
> > This means that we will end up with two io-barriers in a row
> > for external journal case:
> > ext4_sync_file()
> >   ->jbd2_log_start_commit()
> >      ->blkdev_issue_flush(j_fs_dev)
> >   /* seems following flush is mandatory to guarantee the metadata
> >    * consistency */
> >   ->blkdev_issue_flush(j_fs_dev)
> 
> Well, not if we only issue a barrier in the external barrier case....
> but I agree the logic may start getting ugly.
> 
> > I've already started to work with the patch but was surprised with
> > commit logic. 
> > ->__jbd2_log_start_commit(target)
> >    journal->j_commit_request = target
> >    ->wake_up(&journal->j_wait_commit)
> > ->kjournald2
> >    transaction = journal->j_running_transaction
> >    ->jbd2_journal_commit_transaction(journal);
> >      commit_transaction = journal->j_running_transaction
> >      ...
> >      /* So j_commit_request is used only for comparison. But actually
> >         committing journal->j_running_transaction->t_tid transaction
> >         instead of j_commit_request. Why?
> >       */
> 
> Yeah, I don't think jbd2_log_start_commit() has the semantics that are
> currently documented.  We will return 0 if we wake up the commit
> thread, yes --- but since we only check j_commit_request, and it's a
> geq test, it might very well be the case that the transaction was
> committed long ago, and so we end up kicking off a transaction commit
> when one is not needed.  Or, it may be that a transaction is just
> already being committed (and in fact is just about to be completed),
> and so the wake_up() call is a no-op, and so we don't get a barrier
> when one is needed (and expected) by ext4_sync_file().
> 
> We need to look at this very closely, but I think
> jbd2_log_start_commit() needs to check to see whether there is a
> current committing transaction, and whether it is the one which is
> that has been requested as a target.  If so, and a barrier is
> requested, I think we need to have jbd2_log_start_commit() do the
> barrier.  There is a risk of having a double barrier in that case, but
> modifying the flag on the currently committing transaction has the
> danger of being lost by kjournald --- or I guess alternatively we
> could have kjournald check that flag while holding j_state_lock, which
> might be better.
  But still if you happen to set the flag after commit code has checked
it, you have lost the race. Given the bug you describe below, I think
we should provide a new call from JBD2 that will do all the necessary
magic for given TID - something like:
	data_barrier = 0;
	if (journal->j_commit_sequence > tid)
		data_barrier = 1;
	else if (journal->j_committing_transaction &&
		 journal->j_committing_transaction->t_tid == tid) {
		/* Here we could be more clever and set the barrier
		 * flag of the transaction if according to its state
		 * it has not yet issued data barrier */
		data_barrier = 1;
	} else {
		journal->j_running_transaction->has_data = 1;
		jbd2_log_start_commit(journal, tid);
	}
	jbd2_log_wait_commit(journal, tid);
	if (data_barrier)
		blk_dev_issue_flush(journal->j_fs_dev);

  What do you think?

> If the currently running transaction is the requested target, then
> that's easy; we can set the flag, and then wake up the j_wait_commit
> wait queue.  
> 
> In any case, log_start_commit() doesn't currently distinguish between
> whether the targetted commit is the running or the committing
> transaction when it returns 0, and I think that's a problem.
  Yeah, I agree this is a problem and it could cause ext4_sync_file not
properly wait. The simplest fix is to call jbd2_log_wait_commit
unconditionally. But I'd maybe go with the above approach.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SuSE CR Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux