Re: [RFC] ext3: per-process soft-syncing data=ordered mode

Diego Calleja <diegocg@xxxxxxxxx> · Thu, 24 Jan 2008 22:50:26 +0100

El Thu, 24 Jan 2008 23:36:00 +0300, Al Boldi <a1426z@xxxxxxxxx> escribió:

> Greetings!
> 
> data=ordered mode has proven reliable over the years, and it does this by 
> ordering filedata flushes before metadata flushes.  But this sometimes 
> causes contention in the order of a 10x slowdown for certain apps, either 
> due to the misuse of fsync or due to inherent behaviour like db's, as well 
> as inherent starvation issues exposed by the data=ordered mode.

There's a related bug in bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=9546

The diagnostic from Jan Kara is different though, but I think it may be the same
problem...

"One process does data-intensive load. Thus in the ordered mode the
transaction is tiny but has tons of data buffers attached. If commit
happens, it takes a long time to sync all the data before the commit
can proceed... In the writeback mode, we don't wait for data buffers, in
the journal mode amount of data to be written is really limited by the
maximum size of a transaction and so we write by much smaller chunks
and better latency is thus ensured."


I'm hitting this bug too...it's surprising that there's not many people
reporting more bugs about this, because it's really annoying.


There's a patch by Jan Kara (that I'm including here because bugzilla didn't
include it and took me a while to find it) which I don't know if it's supposed to
fix the problem , but it'd be interesting to try:




Don't allow too much data buffers in a transaction.

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 08ff6c7..e6f9dd6 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -163,7 +163,7 @@ repeat_locked:
 	spin_lock(&transaction->t_handle_lock);
 	needed = transaction->t_outstanding_credits + nblocks;
 
-	if (needed > journal->j_max_transaction_buffers) {
+	if (needed > journal->j_max_transaction_buffers || atomic_read(&transaction->t_data_buf_count) > 32768) {
 		/*
 		 * If the current transaction is already too large, then start
 		 * to commit it: we can then go back and attach this handle to
@@ -1528,6 +1528,7 @@ static void __journal_temp_unlink_buffer(struct journal_head *jh)
 		return;
 	case BJ_SyncData:
 		list = &transaction->t_sync_datalist;
+		atomic_dec(&transaction->t_data_buf_count);
 		break;
 	case BJ_Metadata:
 		transaction->t_nr_buffers--;
@@ -1989,6 +1990,7 @@ void __journal_file_buffer(struct journal_head *jh,
 		return;
 	case BJ_SyncData:
 		list = &transaction->t_sync_datalist;
+		atomic_inc(&transaction->t_data_buf_count);
 		break;
 	case BJ_Metadata:
 		transaction->t_nr_buffers++;
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
index d9ecd13..6dd284a 100644
--- a/include/linux/jbd.h
+++ b/include/linux/jbd.h
@@ -541,6 +541,12 @@ struct transaction_s
 	int			t_outstanding_credits;
 
 	/*
+	 * Number of data buffers on t_sync_datalist attached to
+	 * the transaction.
+	 */
+	atomic_t		t_data_buf_count;
+
+	/*
 	 * Forward and backward links for the circular list of all transactions
 	 * awaiting checkpoint. [j_list_lock]
 	 */
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html