Re: [PATCH] xfs: add bounds checking to xlog_recover_process_ophdr

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 3 Jun 2024 15:36:59 +1000

On Mon, Jun 03, 2024 at 12:14:10PM +0800, lei lu wrote:
> Make sure xlog_op_header don't stray beyond valid memory region.
> Check if there is enough space for the fixed members of each
> xlog_op_header before visiting.
> 
> Signed-off-by: lei lu <llfamsec@xxxxxxxxx>

Finding a corrupted ophdr chain this deep in recovery implies that
there is either a runtime bug when writing out the log buffers
during checkpointing or there is memory corruption occurring during
log recovery after the log items have been read from the journal and
attached to the transaction structure for processing. If that's the
case, then addressing the buffer overrun is not fixing the
underlying issue that needs to be triaged and fixed.

The only other way I can think of that this might occur is malicious
corruption of the journal structures on disk. If that's the case,
then you need to state this explicitly so we know that this isn't
a potentially critical runtime bug we need to spend time on
immediately.

Hence we need to know how you found the issue, what it's symptoms
are, how to reproduce the issue, etc.  ie. you need to explain where
you found a CRC validated journal structure that is internally
inconsistent. That's -critical information- that we need to assess
the scope of the issue that you have uncovered.

I asked for you to provide this sort of detail to be provided with
your last overrun fix - we need to know how such failures come about
because they often are just a symptom of some other issue we failed
to capture. You might have all the details in your head, but none of
us know anything about what you are doing.

Can you describe what tools and processes you are using to find
these issues, and please try to include such descriptions in the
commit message for future fixes to issues you find?

> ---
>  fs/xfs/xfs_log_recover.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 1251c81e55f9..660e79a07ce6 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2361,6 +2361,11 @@ xlog_recover_process_ophdr(
>  	unsigned int		len;
>  	int			error;
>  
> +	if (dp > end) {
> +		xfs_warn(log->l_mp, "%s: op header overrun", __func__);
> +		return -EFSCORRUPTED;
> +	}

Why did you put the check here, rather than...

> +
>  	/* Do we understand who wrote this op? */
>  	if (ohead->oh_clientid != XFS_TRANSACTION &&
>  	    ohead->oh_clientid != XFS_LOG) {
> @@ -2456,7 +2461,6 @@ xlog_recover_process_data(
>  
>  		ohead = (struct xlog_op_header *)dp;
>  		dp += sizeof(*ohead);
> -		ASSERT(dp <= end);

.... where the pointer is advanced and the overrun is exposed?

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx