Re: [RFC][PATCH] Journal superblock update should send a barrier

Jan Kara <jack@xxxxxxx> · Fri, 7 May 2010 17:53:25 +0200



>   Hi,
> 
>   while reading through the checkpointing code I've realized that we
> actually have to send a barrier before each update of journal superblock
> after checkpointing. Attached patch does this. Just I'm not sure whether
> the performance cost won't be too big. In principle, we could make this
> more lightweight by using the fact that transaction commit also sends the
> barrier. So we could check before sending a barrier for transaction commit
> whether we are slowly running out of journal space and if so whether some
> transaction isn't already checkpointed. If yes, we can happily submit
> update of journal superblock after the barrier. In case journal is decently
> large this should solve the checkpointing problem without introducing
> noticeable overhead...
  Ping? Ted, any opinion?

								Honza

> -- 
> Jan Kara <jack@xxxxxxx>
> SUSE Labs, CR

> >From e749e83627c35c683114fea32695e581a487e560 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@xxxxxxx>
> Date: Fri, 23 Apr 2010 02:12:32 +0200
> Subject: [PATCH] ext4: Send barrier before updating journal superblock after checkpointing
> 
> We have to send a disk barrier after we have finished checkpointing and before
> we update the journal superblock and thus effectively remove transactions from
> the journal. Otherwise the write of journal superblock can be reordered before
> writes of checkpointed journal blocks and thus in case of crash these blocks
> needn't be on the platter leading to filesystem corruption.
> 
> Signed-off-by: Jan Kara <jack@xxxxxxx>
> ---
>  fs/jbd2/checkpoint.c |   19 +++++++++----------
>  1 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index 30beb11..b2de17f 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -519,17 +519,16 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
>  	spin_unlock(&journal->j_state_lock);
>  
>  	/*
> -	 * If there is an external journal, we need to make sure that
> -	 * any data blocks that were recently written out --- perhaps
> -	 * by jbd2_log_do_checkpoint() --- are flushed out before we
> -	 * drop the transactions from the external journal.  It's
> -	 * unlikely this will be necessary, especially with a
> -	 * appropriately sized journal, but we need this to guarantee
> -	 * correctness.  Fortunately jbd2_cleanup_journal_tail()
> -	 * doesn't get called all that often.
> +	 * We need to make sure that any data blocks that were recently written
> +	 * out --- perhaps by jbd2_log_do_checkpoint() --- are flushed out
> +	 * before we drop the transactions from the journal. Otherwise journal
> +	 * superblock write could be reordered before writeout of data and thus
> +	 * we could corrupt the filesystem in case of crash. It's unlikely this
> +	 * will be necessary, especially with a appropriately sized journal,
> +	 * but we need this to guarantee correctness.  Fortunately
> +	 * jbd2_cleanup_journal_tail() doesn't get called all that often.
>  	 */
> -	if ((journal->j_fs_dev != journal->j_dev) &&
> -	    (journal->j_flags & JBD2_BARRIER))
> +	if (journal->j_flags & JBD2_BARRIER)
>  		blkdev_issue_flush(journal->j_fs_dev, NULL);
>  	if (!(journal->j_flags & JBD2_ABORT))
>  		jbd2_journal_update_superblock(journal, 1);
> -- 
> 1.6.4.2
> 

-- 
Jan Kara <jack@xxxxxxx>
SuSE CR Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html