On 2024/6/26 22:55, Jan Kara wrote: > On Wed 26-06-24 21:24:13, Zhang Yi wrote: >> On 2024/6/26 19:22, Jan Kara wrote: >>> On Wed 26-06-24 15:38:42, Zhang Yi wrote: >>>> On 2024/6/25 1:01, Jan Kara wrote: >>>>> Commit 9f356e5a4f12 ("jbd2: Account descriptor blocks into >>>>> t_outstanding_credits") started to account descriptor blocks into >>>>> transactions outstanding credits. However it didn't appropriately >>>>> decrease the maximum amount of credits available to userspace. Thus if >>>>> the filesystem requests a transaction smaller than >>>>> j_max_transaction_buffers but large enough that when descriptor blocks >>>>> are added the size exceeds j_max_transaction_buffers, we confuse >>>>> add_transaction_credits() into thinking previous handles have grown the >>>>> transaction too much and enter infinite journal commit loop in >>>>> start_this_handle() -> add_transaction_credits() trying to create >>>>> transaction with enough credits available. >>>> >>>> I understand that the incorrect max transaction limit in >>>> start_this_handle() could lead to infinite loop in >>>> start_this_handle()-> add_transaction_credits() with large enough >>>> userspace credits (from j_max_transaction_buffers - overheads to >>>> j_max_transaction_buffers), but I don't get how could it lead to ran >>>> out of space in the journal commit traction? IIUC, below codes in >>>> add_transaction_credits() could make sure that we have enough space >>>> when committing traction: >>>> >>>> static int add_transaction_credits() >>>> { >>>> ... >>>> if (jbd2_log_space_left(journal) < journal->j_max_transaction_buffers) { >>>> ... >>>> return 1; >>>> ... >>>> } >>>> ... >>>> } >>>> >>>> I can't open and download the image Alexander gave, so I can't get to >>>> the bottom of this issue, please let me know what happened with >>>> jbd2_journal_next_log_block(). >>> >>> Sure. So what was exactly happening is a loop like this: >>> >>> start_this_handle() >>> blocks = 252 (handle->h_total_credits) >>> - starts a new transaction >>> - t_outstanding_credits set to 6 to account for the commit block and >>> descriptor blocks >>> add_transaction_credits(journal, 252, 0) >>> needed = atomic_add_return(252, &t->t_outstanding_credits); >>> if (needed > journal->j_max_transaction_buffers) { >>> /* Yes, this is exceeded due to descriptor blocks being in >>> * t_outstanding_credits */ >>> ... >>> wait_transaction_locked(journal); >>> - this commits an empty transaction - contains only the commit >>> block >>> return 1 >>> goto repeat >>> >>> So we commit single block transactions in a loop until we exhaust all the >>> journal space. The condition in add_transaction_credits() whether there's >>> enough space in the journal is never reached so we don't ever push the >>> journal tail to make space in the journal. >>> >> >> mm-hm, ha, yeah, this will lead to submit an empty transaction in each loop, >> but I still have one question, although the journal tail can't be updated >> in add_transaction_credits(), I think the journal tail should be updated in >> jbd2_journal_commit_transaction()->jbd2_update_log_tail() since we don't >> add empty transactions to journal->j_checkpoint_transactions list, the loop >> in jbd2_journal_commit_transaction() should like this: >> >> ... >> jbd2_journal_commit_transaction() >> update_tail = jbd2_journal_get_log_tail() >> //journal->j_checkpoint_transactions should be NULL, tid is the >> //committing transaction's tid, which should be large than the >> //tail tid since the second loop, so this should be true after >> //the second loop >> if (freed < journal->j_max_transaction_buffers) >> update_tail = 0; >> //update_tail should be true after j_max_transaction_buffers loop >> >> jbd2_update_log_tail() //j_free should be increased in each >> //j_max_transaction_buffers loop >> if (tid_gt(tid, journal->j_tail_sequence)) //it's true >> __jbd2_update_log_tail() >> journal->j_free += freed; //update log tail and increase j_free >> //j_max_transaction_buffers blocks >> ... >> >> As I understand it, the journal space can't be exhausted, I don't know how >> it happened, am I missing something? > > Well, at the beginning of the journal there are a few non-empty > transactions whose buffers didn't get checkpointed before we entered the > commit loop. So we cannot just push the journal tail without doing a > checkpoint and we never call into the checkpointing code in the commit > loop... Oh, indeed, I see, thanks for explaining and this patch looks good to me. Reviewed-by: Zhang Yi <yi.zhang@xxxxxxxxxx> Thanks, Yi.