The patch titled ext4: fix commit of ordered data buffers has been removed from the -mm tree. Its filename is ext3-fix-commit-of-ordered-data-buffers.patch This patch was dropped because it is obsolete ------------------------------------------------------ Subject: ext4: fix commit of ordered data buffers From: Jan Kara <jack@xxxxxxx> Should fix handling of ordered data buffers in journal_commit_write(). Currently the code simply moves the buffer from t_sync_data list to t_locked list when the buffer is locked. But in theory (I'm not sure this currently really happens for ext3 ordered data buffer) the buffer can be locked just for "examination" purposes and not being sent to disk. Hence it could happen we don't write some ordered buffer when we should. What the patch does is that it writes the buffer when it is dirty (even if it is locked - hence it can happen we call ll_rw_block() on the buffer currently under write-out but in this case ll_rw_block() just waits until IO is complete and returns (as the buffer won't be dirty anymore) and this race with pdflush should be rare enough not to affect the performance). The patch also moves buffer to t_locked list immediately after queing the buffer for write-out (as otherwise we would have to detect which buffers are already queued and that's not nice). This changes a bit a logic of t_locked list - unlocked buffer for the users different from journal_commit_transaction() does not mean the buffer is already written. But only journal_unmap_buffer() cares about this changes and I changed that one to check if the buffer is not dirty. When a buffer is locked it does not mean that write-out is in progress. We have to check if the buffer is dirty and if it is we have to submit it for write-out. We unconditionally move the buffer to t_locked_list so that we don't mistake unprocessed buffer and buffer not yet given to ll_rw_block(). This subtly changes the meaning of buffer states in t_locked_list - unlock buffer (for list users different from journal_commit_transaction()) does not mean it has been written. But only journal_unmap_buffer() cares and it now checks if the buffer is not dirty. Signed-off-by: Jan Kara <jack@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- fs/jbd/commit.c | 51 ++++++++++++++++++++++++++++++++------------------- fs/jbd/transaction.c | 6 +++--- 2 files changed, 35 insertions(+), 22 deletions(-) diff -puN fs/jbd/commit.c~ext3-fix-commit-of-ordered-data-buffers fs/jbd/commit.c --- devel/fs/jbd/commit.c~ext3-fix-commit-of-ordered-data-buffers 2005-09-13 18:28:40.000000000 -0700 +++ devel-akpm/fs/jbd/commit.c 2005-09-13 18:28:40.000000000 -0700 @@ -337,19 +337,32 @@ write_out_data: jh = commit_transaction->t_sync_datalist; commit_transaction->t_sync_datalist = jh->b_tnext; bh = jh2bh(jh); - if (buffer_locked(bh)) { - BUFFER_TRACE(bh, "locked"); + + /* + * If the buffer is dirty, it needs a write-out (the write-out + * may be already in progress but that should be rare so giving + * the buffer once more to ll_rw_block() should not affect + * the performance). We move the buffer out of t_sync_datalist + * to t_locked_list to make a progress. This handling of dirty + * buffer has the disadvantage that for functions different + * from journal_commit_transaction() unlocked buffer in + * t_locked_list does not mean it has been written. *BUT* the + * only place that cares is journal_unmap_buffer() and that + * checks that the buffer is not dirty. + * + * If the buffer is locked and not dirty, someone has submitted + * the buffer for IO before us so we just move the buffer to + * t_locked_list (to wait for IO completion). + */ + if (buffer_locked(bh) || buffer_dirty(bh)) { + BUFFER_TRACE(bh, "locked or dirty"); if (!inverted_lock(journal, bh)) goto write_out_data; __journal_temp_unlink_buffer(jh); __journal_file_buffer(jh, commit_transaction, BJ_Locked); jbd_unlock_bh_state(bh); - if (lock_need_resched(&journal->j_list_lock)) { - spin_unlock(&journal->j_list_lock); - goto write_out_data; - } - } else { + if (buffer_dirty(bh)) { BUFFER_TRACE(bh, "start journal writeout"); get_bh(bh); @@ -363,19 +376,19 @@ write_out_data: bufs = 0; goto write_out_data; } - } else { - BUFFER_TRACE(bh, "writeout complete: unfile"); - if (!inverted_lock(journal, bh)) - goto write_out_data; - __journal_unfile_buffer(jh); - jbd_unlock_bh_state(bh); - journal_remove_journal_head(bh); - put_bh(bh); - if (lock_need_resched(&journal->j_list_lock)) { - spin_unlock(&journal->j_list_lock); - goto write_out_data; - } } + } else { + BUFFER_TRACE(bh, "writeout complete: unfile"); + if (!inverted_lock(journal, bh)) + goto write_out_data; + __journal_unfile_buffer(jh); + jbd_unlock_bh_state(bh); + journal_remove_journal_head(bh); + put_bh(bh); + } + if (lock_need_resched(&journal->j_list_lock)) { + spin_unlock(&journal->j_list_lock); + goto write_out_data; } } diff -puN fs/jbd/transaction.c~ext3-fix-commit-of-ordered-data-buffers fs/jbd/transaction.c --- devel/fs/jbd/transaction.c~ext3-fix-commit-of-ordered-data-buffers 2005-09-13 18:28:40.000000000 -0700 +++ devel-akpm/fs/jbd/transaction.c 2005-09-13 18:28:40.000000000 -0700 @@ -1814,11 +1814,11 @@ static int journal_unmap_buffer(journal_ } } } else if (transaction == journal->j_committing_transaction) { - if (jh->b_jlist == BJ_Locked) { + if (jh->b_jlist == BJ_Locked && !buffer_dirty(bh)) { /* * The buffer is on the committing transaction's locked - * list. We have the buffer locked, so I/O has - * completed. So we can nail the buffer now. + * list. We have the buffer locked and !dirty, so I/O + * has completed. So we can nail the buffer now. */ may_free = __dispose_buffer(jh, transaction); goto zap_buffer; _ Patches currently in -mm which might be from jack@xxxxxxx are origin.patch use-list_add_tail-instead-of-list_add.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html