Re: [PATCH 3/4] xfs: create perag structures as soon as possible during log recovery

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 16 Sep 2024 11:28:26 +1000

On Tue, Sep 10, 2024 at 07:28:46AM +0300, Christoph Hellwig wrote:
> An unclean log can contain both the transaction that created a new
> allocation group and the first transaction that is freeing space from
> it, in which case the extent free item recovery requires the perag
> structure to be present.
>
> Currently the perag structures are only created after log recovery
> has completed, leading a warning and file system shutdown for the
> above case.

I'm missing something - the intents aren't processed until the log
has been recovered - queuing an intent to be processed does
not require the per-ag to be present. We don't take per-ag
references until we are recovering the intent. i.e. we've completed
journal recovery and haven't found the corresponding EFD.

That leaves the EFI in the log->r_dfops, and we then run
->recover_work in the second phase of recovery. It is
xfs_extent_free_recover_work() that creates the
new transaction and runs the EFI processing that requires the
perag references, isn't it?

IOWs, I don't see where the initial EFI/EFD recovery during the
checkpoint processing requires the newly created perags to be
present in memory for processing incomplete EFIs before the journal
recovery phase has completed.

> 
> Fix this by creating new perag structures and updating
> the in-memory superblock fields as soon a buffer log item that covers
> the primary super block is recovered.
> 
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> ---
>  fs/xfs/libxfs/xfs_log_recover.h |  2 ++
>  fs/xfs/xfs_buf_item_recover.c   | 16 +++++++++
>  fs/xfs/xfs_log_recover.c        | 59 ++++++++++++++-------------------
>  3 files changed, 43 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
> index 521d327e4c89ed..d0e13c84422d0a 100644
> --- a/fs/xfs/libxfs/xfs_log_recover.h
> +++ b/fs/xfs/libxfs/xfs_log_recover.h
> @@ -165,4 +165,6 @@ void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
>  int xlog_recover_finish_intent(struct xfs_trans *tp,
>  		struct xfs_defer_pending *dfp);
>  
> +int xlog_recover_update_agcount(struct xfs_mount *mp, struct xfs_dsb *dsb);
> +
>  #endif	/* __XFS_LOG_RECOVER_H__ */
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 09e893cf563cb9..033821a56b6ac6 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -969,6 +969,22 @@ xlog_recover_buf_commit_pass2(
>  			goto out_release;
>  	} else {
>  		xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
> +
> +		/*
> +		 * Update the in-memory superblock and perag structures from the
> +		 * primary SB buffer.
> +		 *
> +		 * This is required because transactions running after growf
> +		 * s may require in-memory structures like the perag right after
> +		 * committing the growfs transaction that created the underlying
> +		 * objects.
> +		 */
> +		if ((xfs_blft_from_flags(buf_f) & XFS_BLFT_SB_BUF) &&
> +		    xfs_buf_daddr(bp) == 0) {
> +			error = xlog_recover_update_agcount(mp, bp->b_addr);
> +			if (error)
> +				goto out_release;
> +		}
>  	}

If we are going to keep this logic, can you do this as a separate
helper function? i.e.:

	if (inode buffer) {
                xlog_recover_do_inode_buffer();
        } else if (dquot buffer) {
                xlog_recover_do_dquot_buffer();
        } else if (superblock buffer) {
		xlog_recover_do_sb_buffer();
	} else {
                xlog_recover_do_reg_buffer();
        }

and

xlog_recover_do_sb_buffer()
{
	error = xlog_recover_do_reg_buffer()
	if (error || xfs_buf_daddr(bp) != XFS_SB_ADDR)
		return error;
	return xlog_recover_update_agcount();
}

>  
>  	/*
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 2af02b32f419c2..7d7ab146cae758 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3334,6 +3334,30 @@ xlog_do_log_recovery(
>  	return error;
>  }
>  
> +int
> +xlog_recover_update_agcount(
> +	struct xfs_mount		*mp,
> +	struct xfs_dsb			*dsb)
> +{
> +	xfs_agnumber_t			old_agcount = mp->m_sb.sb_agcount;
> +	int				error;
> +
> +	xfs_sb_from_disk(&mp->m_sb, dsb);
> +	if (mp->m_sb.sb_agcount < old_agcount) {
> +		xfs_alert(mp, "Shrinking AG count in log recovery");
> +		return -EFSCORRUPTED;
> +	}
> +	mp->m_features |= xfs_sb_version_to_features(&mp->m_sb);

I'm not sure this is safe. The item order in the checkpoint recovery
isn't guaranteed to be exactly the same as when feature bits are
modified at runtime. Hence there could be items in the checkpoint
that haven't yet been recovered that are dependent on the original
sb feature mask being present.  It may be OK to do this at the end
of the checkpoint being recovered.

I'm also not sure why this feature update code is being changed
because it's not mentioned at all in the commit message.

> +	error = xfs_initialize_perag(mp, old_agcount, mp->m_sb.sb_agcount,
> +			mp->m_sb.sb_dblocks, &mp->m_maxagi);

Why do this if sb_agcount has not changed?  AFAICT it only iterates
the AGs already initialised and so skips them, then recalculates
inode32 and prealloc block parameters, which won't change. Hence
it's a total no-op for anything other than an actual ag count change
and should be skipped, right?

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx