On Wed 26-06-24 21:24:13, Zhang Yi wrote: > On 2024/6/26 19:22, Jan Kara wrote: > > On Wed 26-06-24 15:38:42, Zhang Yi wrote: > >> On 2024/6/25 1:01, Jan Kara wrote: > >>> Commit 9f356e5a4f12 ("jbd2: Account descriptor blocks into > >>> t_outstanding_credits") started to account descriptor blocks into > >>> transactions outstanding credits. However it didn't appropriately > >>> decrease the maximum amount of credits available to userspace. Thus if > >>> the filesystem requests a transaction smaller than > >>> j_max_transaction_buffers but large enough that when descriptor blocks > >>> are added the size exceeds j_max_transaction_buffers, we confuse > >>> add_transaction_credits() into thinking previous handles have grown the > >>> transaction too much and enter infinite journal commit loop in > >>> start_this_handle() -> add_transaction_credits() trying to create > >>> transaction with enough credits available. > >> > >> I understand that the incorrect max transaction limit in > >> start_this_handle() could lead to infinite loop in > >> start_this_handle()-> add_transaction_credits() with large enough > >> userspace credits (from j_max_transaction_buffers - overheads to > >> j_max_transaction_buffers), but I don't get how could it lead to ran > >> out of space in the journal commit traction? IIUC, below codes in > >> add_transaction_credits() could make sure that we have enough space > >> when committing traction: > >> > >> static int add_transaction_credits() > >> { > >> ... > >> if (jbd2_log_space_left(journal) < journal->j_max_transaction_buffers) { > >> ... > >> return 1; > >> ... > >> } > >> ... > >> } > >> > >> I can't open and download the image Alexander gave, so I can't get to > >> the bottom of this issue, please let me know what happened with > >> jbd2_journal_next_log_block(). > > > > Sure. So what was exactly happening is a loop like this: > > > > start_this_handle() > > blocks = 252 (handle->h_total_credits) > > - starts a new transaction > > - t_outstanding_credits set to 6 to account for the commit block and > > descriptor blocks > > add_transaction_credits(journal, 252, 0) > > needed = atomic_add_return(252, &t->t_outstanding_credits); > > if (needed > journal->j_max_transaction_buffers) { > > /* Yes, this is exceeded due to descriptor blocks being in > > * t_outstanding_credits */ > > ... > > wait_transaction_locked(journal); > > - this commits an empty transaction - contains only the commit > > block > > return 1 > > goto repeat > > > > So we commit single block transactions in a loop until we exhaust all the > > journal space. The condition in add_transaction_credits() whether there's > > enough space in the journal is never reached so we don't ever push the > > journal tail to make space in the journal. > > > > mm-hm, ha, yeah, this will lead to submit an empty transaction in each loop, > but I still have one question, although the journal tail can't be updated > in add_transaction_credits(), I think the journal tail should be updated in > jbd2_journal_commit_transaction()->jbd2_update_log_tail() since we don't > add empty transactions to journal->j_checkpoint_transactions list, the loop > in jbd2_journal_commit_transaction() should like this: > > ... > jbd2_journal_commit_transaction() > update_tail = jbd2_journal_get_log_tail() > //journal->j_checkpoint_transactions should be NULL, tid is the > //committing transaction's tid, which should be large than the > //tail tid since the second loop, so this should be true after > //the second loop > if (freed < journal->j_max_transaction_buffers) > update_tail = 0; > //update_tail should be true after j_max_transaction_buffers loop > > jbd2_update_log_tail() //j_free should be increased in each > //j_max_transaction_buffers loop > if (tid_gt(tid, journal->j_tail_sequence)) //it's true > __jbd2_update_log_tail() > journal->j_free += freed; //update log tail and increase j_free > //j_max_transaction_buffers blocks > ... > > As I understand it, the journal space can't be exhausted, I don't know how > it happened, am I missing something? Well, at the beginning of the journal there are a few non-empty transactions whose buffers didn't get checkpointed before we entered the commit loop. So we cannot just push the journal tail without doing a checkpoint and we never call into the checkpointing code in the commit loop... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR