On Thu, 2008-05-01 at 08:16 -0700, Badari Pulavarty wrote: > Hi Andrew & Jan, > > I was able to reproduce the customer problem involving DIO > (invalidate_inode_pages2) problem by writing simple testcase > to keep writing to a file using buffered writes and DIO writes > forever in a loop. I see DIO writes fail with -EIO. > > After a long debug, found 2 cases how this could happen. > These are race conditions with > and journal_commit_transaction(). > > 1) journal_submit_data_buffers() tries to get bh_state lock. If > try lock fails, it drops the j_list_lock and sleeps for > bh_state lock, while holding a reference on the buffer. > In the meanwhile, journal_try_to_free_buffers() can clean up the > journal head could call try_to_free_buffers(). try_to_free_buffers() > would fail due to the reference held by journal_submit_data_buffers() > - which in turn causes failues for DIO (invalidate_inode_pages2()). > > 2) When the buffer is on t_locked_list waiting for IO to finish, > we hold a reference and give up the cpu, if we can't get > bh_state lock. This causes try_to_free_buffers() to fail. > Besides these two, I think there are two more race conditions with journal_try_to_free_buffers() inside journal_commit_transaction()->journal_submit_data_buffers() 3) when journal_submit_data_buffers() saw the buffer is dirty but failed to lock the buffer bh1, journal_submit_data_buffers() released the j_list_lock and submit other buffers collected from previous check, with the reference to bh1 still hold. During this time journal_try_to_free_buffers() could clean up the journal head of bh1 and remove it from the t_syncdata_list. Then try_to_free_buffers() would fail because the reference held by journal_submit_data_buffers() ... if (buffer_dirty(bh)) { if (test_set_buffer_locked(bh)) { BUFFER_TRACE(bh, "needs blocking lock"); spin_unlock(&journal->j_list_lock); <-- here release the j_list_lock without put(bh) journal_try_to_free_buffers() could come in and remove this bh from t_syncdata_list /* Write out all data to prevent deadlocks */ journal_do_submit_data(wbuf, bufs); bufs = 0; lock_buffer(bh); spin_lock(&journal->j_list_lock); <-- here continue the check without validate if the bh still on t_sycdata_list } locked = 1; } 4) when journal_commit_transaction() go through the t_locked_list and wait for the buffer to be unlocked, it still holds the reference to the buffer, released the j_list_lock and gives the journal_try_to_free_buffers() a chance to come in remove this buffer from t_locked_list, but journal_commit_transaction() continues as if the buffer still on the locked list. while (commit_transaction->t_locked_list) { struct buffer_head *bh; jh = commit_transaction->t_locked_list->b_tprev; bh = jh2bh(jh); get_bh(bh); if (buffer_locked(bh)) { spin_unlock(&journal->j_list_lock); wait_on_buffer(bh); if (unlikely(!buffer_uptodate(bh))) err = -EIO; spin_lock(&journal->j_list_lock); } Mingming -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html