Re: [PATCH v2] xfs: avoid LR buffer overrun due to crafted h_len

Gao Xiang <hsiangkao@xxxxxxxxxx> · Thu, 3 Sep 2020 06:47:26 +0800

Hi Brian,

On Wed, Sep 02, 2020 at 01:38:59PM -0400, Brian Foster wrote:
> On Wed, Sep 02, 2020 at 10:19:23PM +0800, Gao Xiang wrote:

...

> > However, each log record could still have crafted
> > h_len and cause log record buffer overrun. So let's
> > check h_len for each log record as well instead.
> > 
> 
> Is this something you've observed or attempted to reproduce, or is this
> based on code inspection?

Thanks for your review.

based on code inspection, the logic seems straight-forward

in xlog_do_recovery_pass()
	...

	dbp = xlog_alloc_buffer(log, BTOBB(h_size));
					^ here uses h_size from the tail block
	if (!dbp) {
		kmem_free(hbp);
		return -ENOMEM;
	}

	if (tail_blk > head_blk) {
		while (blk_no < log->l_logBBsize) {
			xlog_bread
			xlog_valid_rec_header
			xlog_recover_process
		}
	}

	while (blk_no < head_blk) {
		xlog_bread
		xlog_valid_rec_header
		xlog_recover_process
	}

in xlog_recover_process()
	crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
							^here
	...

and also xlog_recover_process_data()
	end = dp + be32_to_cpu(rhead->h_len);
	...
	while ((dp < end) && num_logops) {
		ohead = (struct xlog_op_header *)dp;
		(all things around dp/ohead if num_logops is crafted as well. 
		...
	}

> 
> > -	if (XFS_IS_CORRUPT(log->l_mp, hlen <= 0 || hlen > INT_MAX))
> > +	if (XFS_IS_CORRUPT(log->l_mp, hlen <= 0))
> 
> Why is the second part of the check removed?

if hlen <= hsize (hsize > 0) then hlen will be <= INT_MAX

> 
> >  		return -EFSCORRUPTED;
> > +
> > +	if (hsize && XFS_IS_CORRUPT(log->l_mp,
> > +				    hsize < be32_to_cpu(rhead->h_size)))
> > +		return -EFSCORRUPTED;
> > +	hsize = be32_to_cpu(rhead->h_size);
> 
> I'm a little confused why we take hsize as a parameter as well as read
> it from the record header. If we're validating a particular record,
> shouldn't we use the size as specified by that record?
> 
> Also FWIW I think pulling bits of logic out of the XFS_IS_CORRUPT()
> check makes this a little harder to read than just putting the entire
> logic statement within the macro.

It seems that is partially self-answered in the last part of the email.
So move the response to the last of the email...

> 
> > +
> > +	if (unlikely(hlen > hsize)) {
> 
> I think we've made a point to avoid the [un]likely() modifiers in XFS as
> they don't usually have a noticeable impact. I certainly wouldn't expect
> it to in log recovery.

Honestly, I really don't want to work on some topic about [un]likely,
I did a long discussion with Dan Carpenter and a couple of other people
last year, but *shrug*

For this case just simply bacause XFS_IS_CORRUPT() has this annotation,
and it seems xlog_valid_rec_header() logic will be changed in v3
if we leave the mkfs workaround logic as is.

> 
> > +		if (XFS_IS_CORRUPT(log->l_mp, hlen > log->l_mp->m_logbsize ||
> > +				   rhead->h_num_logops != cpu_to_be32(1)))
> > +			return -EFSCORRUPTED;
> > +
> > +		xfs_warn(log->l_mp,
> > +		"invalid iclog size (%d bytes), using lsunit (%d bytes)",
> > +			 hsize, log->l_mp->m_logbsize);
> > +		rhead->h_size = cpu_to_be32(log->l_mp->m_logbsize);
> 
> I also find updating the header structure as such down in a "validation
> helper" a bit obscured.

also the same words at the last of the email...

> 
> > +	}
> > +
> >  	if (XFS_IS_CORRUPT(log->l_mp,
> >  			   blkno > log->l_logBBsize || blkno > INT_MAX))
> >  		return -EFSCORRUPTED;
> ...
> > @@ -3096,7 +3100,7 @@ xlog_do_recovery_pass(
> >  			}
> >  			rhead = (xlog_rec_header_t *)offset;
> >  			error = xlog_valid_rec_header(log, rhead,
> > -						split_hblks ? blk_no : 0);
> > +					split_hblks ? blk_no : 0, h_size);
> >  			if (error)
> >  				goto bread_err2;
> >  
> > @@ -3177,7 +3181,7 @@ xlog_do_recovery_pass(
> >  			goto bread_err2;
> >  
> >  		rhead = (xlog_rec_header_t *)offset;
> > -		error = xlog_valid_rec_header(log, rhead, blk_no);
> > +		error = xlog_valid_rec_header(log, rhead, blk_no, h_size);
> >  		if (error)
> >  			goto bread_err2;
> 
> In these two cases we've already allocated the record header and data
> buffers and we're walking through the log records doing recovery. Given
> that, it seems like the purpose of the parameter is more to check the
> subsequent records against the size of the current record buffer. That
> seems like a reasonable check to incorporate, but I think the mkfs

Yes

> workaround logic is misplaced in a generic record validation helper.
> IIUC that is a very special case that should only apply to the first
> record in the log and only impacts the size of the buffer we allocate to
> read in the remaining records.
> 
> Can we rework this to leave the mkfs workaround logic as is and update
> the validation helper to check that each record length fits in the size
> of the buffer we've decided to allocate? I'd also suggest to rename the
> new parameter to something like 'bufsize' instead of 'h_size' to clarify
> what it actually means in the context of xlog_valid_rec_header().

Ok, that is fine. I will leave the mkfs workaround logic as is and rename
to bufsize.

Thanks,
Gao Xiang

> 
> Brian
> 
> >  
> > -- 
> > 2.18.1
> > 
>