On Thu 04-10-18 13:50:12, Lukas Czerner wrote: > On Thu, Oct 04, 2018 at 12:46:40PM +0200, Jan Kara wrote: > > The code cleaning transaction's lists of checkpoint buffers has a bug > > where it increases bh refcount only after releasing > > journal->j_list_lock. Thus the following race is possible: > > > > CPU0 CPU1 > > jbd2_log_do_checkpoint() > > jbd2_journal_try_to_free_buffers() > > __journal_try_to_free_buffer(bh) > > ... > > while (transaction->t_checkpoint_io_list) > > ... > > if (buffer_locked(bh)) { > > > > <-- IO completes now, buffer gets unlocked --> > > > > spin_unlock(&journal->j_list_lock); > > spin_lock(&journal->j_list_lock); > > __jbd2_journal_remove_checkpoint(jh); > > spin_unlock(&journal->j_list_lock); > > try_to_free_buffers(page); > > get_bh(bh) <-- accesses freed bh > > > > Fix the problem by grabbing bh reference before unlocking > > journal->j_list_lock. > > Hi Jan, > > nice catch. The patch looks good, you can add > > Reviewed-by: Lukas Czerner <lczerner@xxxxxxxxxx> > > Btw, do you by any chance have a reproducer for this ? No, syzbot hit it but the race window is really small so I don't think you can create reasonably reliable reproducer... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR