On Fri 22-03-13 20:18:41, Dmitry Monakhov wrote: > Following race is possible > [kjournald2] other_task > jbd2_journal_commit_transaction() > j_state = T_FINISHED; > spin_unlock(&journal->j_list_lock); > ->jbd2_journal_remove_checkpoint() > ->jbd2_journal_free_transaction(); > ->kmem_cache_free(transaction) > ->j_commit_callback(journal, transaction); > -> USE_AFTER_FREE > > WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250() > Hardware name: > list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b > Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod > Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107 > Call Trace: > [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0 > [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50 > [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0 > [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250 > [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0 > [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570 > [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0 > [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0 > [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0 > [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40 > [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80 > [<ffffffff810ac6be>] kthread+0x10e/0x120 > [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70 > [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0 > [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70 > > In order to demonstrace this issue one should mount ext4 with -odiscard option > on SSD disk. This makes callback longer and race window becomes wider. > > In order to fix this we should mark transaction as finished only after > one of contidions are true: > 1) Checkpoint lists are empty so we are shure what jbd2_journal_remove_checkpoint() > can not be called under our feets. > 2) All callbacks have completed Nice catch! One minor comment below: > Signed-off-by: Dmitry Monakhov <dmonakhov@xxxxxxxxxx> > --- > fs/jbd2/commit.c | 28 +++++++++++++++++++++++----- > include/linux/jbd2.h | 1 + > 2 files changed, 24 insertions(+), 5 deletions(-) > > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c > index 750c701..31541f7 100644 > --- a/fs/jbd2/commit.c > +++ b/fs/jbd2/commit.c > @@ -1134,7 +1134,7 @@ restart_loop: > journal->j_stats.run.rs_blocks_logged += stats.run.rs_blocks_logged; > spin_unlock(&journal->j_history_lock); > > - commit_transaction->t_state = T_FINISHED; > + commit_transaction->t_state = T_COMMIT_CALLBACK; > J_ASSERT(commit_transaction == journal->j_committing_transaction); > journal->j_commit_sequence = commit_transaction->t_tid; > journal->j_committing_transaction = NULL; > @@ -1149,12 +1149,20 @@ restart_loop: > journal->j_average_commit_time*3) / 4; > else > journal->j_average_commit_time = commit_time; > - write_unlock(&journal->j_state_lock); > > + /* > + * If checkpoint lists are empty it is safe to mark transaction as > + * finished because no one will destroy it in under our feets. > + */ > if (commit_transaction->t_checkpoint_list == NULL && > commit_transaction->t_checkpoint_io_list == NULL) { > - __jbd2_journal_drop_transaction(journal, commit_transaction); > to_free = 1; > + commit_transaction->t_state = T_FINISHED; > + } > + write_unlock(&journal->j_state_lock); > + > + if (to_free) { > + __jbd2_journal_drop_transaction(journal, commit_transaction); > } else { > if (journal->j_checkpoint_transactions == NULL) { > journal->j_checkpoint_transactions = commit_transaction; > @@ -1179,8 +1187,18 @@ restart_loop: > trace_jbd2_end_commit(journal, commit_transaction); > jbd_debug(1, "JBD2: commit %d complete, head %d\n", > journal->j_commit_sequence, journal->j_tail_sequence); > - if (to_free) > - jbd2_journal_free_transaction(commit_transaction); > > + write_lock(&journal->j_state_lock); > + spin_lock(&journal->j_list_lock); > + commit_transaction->t_state = T_FINISHED; > + /* Recheck checkpoint lists after j_list_lock was dropped */ > + if (commit_transaction->t_checkpoint_list == NULL && > + commit_transaction->t_checkpoint_io_list == NULL) { > + if (!to_free) > + __jbd2_journal_drop_transaction(journal, commit_transaction); > + jbd2_journal_free_transaction(commit_transaction); > + } > + spin_unlock(&journal->j_list_lock); > + write_unlock(&journal->j_state_lock); > wake_up(&journal->j_wait_done_commit); > } Hm, I'd just link the transaction into checkpoint lists when changing state to T_COMMIT_CALLBACK and then handle all the dropping / freeing in one place after setting state to T_FINISHED. It should be slightly easier to read. Also please add a comment before the call of callback that we have to call it there because once transaction is in T_FINISHED state, checkpointing code can free it. Honza > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h > index 50e5a5e..608ea13 100644 > --- a/include/linux/jbd2.h > +++ b/include/linux/jbd2.h > @@ -480,6 +480,7 @@ struct transaction_s > T_COMMIT, > T_COMMIT_DFLUSH, > T_COMMIT_JFLUSH, > + T_COMMIT_CALLBACK, > T_FINISHED > } t_state; > > -- > 1.7.1 > -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html