On Thu, Jan 11, 2024 at 10:08:51AM +1100, Dave Chinner wrote: > On Wed, Jan 10, 2024 at 03:03:24PM +0800, Long Li wrote: > > On Wed, Jan 10, 2024 at 08:43:31AM +1100, Dave Chinner wrote: > > > The issue is that the code as it stands doesn't handle object > > > recovery from multiple checkpoints with the same start lsn. The > > > easiest way to understand this is to look at the buffer submit logic > > > on completion of a checkpoint: > > > > > > if (log->l_recovery_lsn != trans->r_lsn && > > > ohead->oh_flags & XLOG_COMMIT_TRANS) { > > > error = xfs_buf_delwri_submit(buffer_list); > > > if (error) > > > return error; > > > log->l_recovery_lsn = trans->r_lsn; > > > } > > > > > > This submits the buffer list on the first checkpoint that completes > > > with a new start LSN, not when all the checkpoints with the same > > > start LSN complete. i.e.: > > > > > > checkpoint start LSN commit lsn submission on commit record > > > A 32 63 buffer list for A > > > B 64 68 buffer list for B > > > C 64 92 nothing, start lsn unchanged > > > D 64 127 nothing, start lsn unchanged > > > E 128 192 buffer list for C, D and E > > > > > > > I have different understanding about this code. In the first checkpoint's > > handle on commit record, buffer_list is empty and l_recovery_lsn update to > > the first checkpoint's lsn, the result is that each checkpoint's submit > > logic try to submit the buffers which was added to buffer list in checkpoint > > recovery of previous LSN. > > > > xlog_do_recovery_pass > > LIST_HEAD (buffer_list); > > xlog_recover_process > > xlog_recover_process_data > > xlog_recover_process_ophdr > > xlog_recovery_process_trans > > if (log->l_recovery_lsn != trans->r_lsn && > > ohead->oh_flags & XLOG_COMMIT_TRANS) { > > xfs_buf_delwri_submit(buffer_list); //submit buffer list > > log->l_recovery_lsn = trans->r_lsn; > > } > > xlog_recovery_process_trans > > xlog_recover_commit_trans > > xlog_recover_items_pass2 > > item->ri_ops->commit_pass2 > > xlog_recover_buf_commit_pass2 > > xfs_buf_delwri_queue(bp, buffer_list) //add bp to buffer list > > if (!list_empty(&buffer_list)) > > /* submit buffers that was added in checkpoint recovery of last LSN */ > > xfs_buf_delwri_submit(&buffer_list) > > > > So, I think it should be: > > > > checkpoint start LSN commit lsn submission on commit record > > A 32 63 nothing, buffer list is empty > > B 64 68 buffer list for A > > C 64 92 nothing, start lsn unchanged > > D 64 127 nothing, start lsn unchanged > > E 128 192 buffer list for B, C and D Yeah, I managed to look through some of this code yesterday but hadn't got back to this. I suspect the disconnect here is that the buffer list isn't populated until we process the commit record for the associated log transaction, and that occurs after the check to submit the buffer list. So when the commit record of A is seen, the recovery LSN is updated to 32 and the buffer list is still empty, and then is subsequently populated based on the content of the transaction. From there, not until we see a commit record for a transaction with a start LSN != 32 is the buffer list submitted. This is how recovery knows there are no more metadata lsn == 32 items, and the buffer list may include any number of checkpoints that start at the current recovery lsn. In general, this relies on proper commit record ordering which AFAIA remains a pretty fundamental rule of XFS logging. It seems the confusion has been resolved in any event, but something that comes to mind as possibly beneficial to future readers is to perhaps consider turning this submit check into a small helper with a proper name and any tweaks that might help clarify the existing comment. I think Dave's followup comments to invoke shutdown rather than play games with cancellation make sense, so I'm particularly wondering if we could just create an xlog_recovery_process_buffers() or some such helper that could be reused in both places by conditionally doing the shutdown/submit on error, and then documents everything nicely in once place. Just a thought, however. It may not be worth it depending on how ugly it looks. Brian > > You are right, I made a mistake in determining the order of buffer > list submission vs checkpoint recovery taht builds a given buffer > list. Mistakes happen when you only look at complex code once every > few years. I will go back and look at the original patch again with > this in mind. > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >