Re: [PATCH v4 1/2] xfs: avoid LR buffer overrun due to crafted h_len

Gao Xiang <hsiangkao@xxxxxxxxxx> · Tue, 22 Sep 2020 23:28:21 +0800

Hi Brian,

On Tue, Sep 22, 2020 at 11:22:12AM -0400, Brian Foster wrote:
> On Thu, Sep 17, 2020 at 01:13:40PM +0800, Gao Xiang wrote:
> > Currently, crafted h_len has been blocked for the log
> > header of the tail block in commit a70f9fe52daa ("xfs:
> > detect and handle invalid iclog size set by mkfs").
> > 
> > However, each log record could still have crafted h_len
> > and cause log record buffer overrun. So let's check
> > h_len vs buffer size for each log record as well.
> > 
> > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxx>
> > ---
> > v3: https://lore.kernel.org/r/20200904082516.31205-2-hsiangkao@xxxxxxxxxx
> > 
> > changes since v3:
> >  - drop exception comment to avoid confusion (Brian);
> >  - check rhead->hlen vs buffer size to address
> >    the harmful overflow (Brian);
> > 
> > And as Brian requested previously, "Also, please test the workaround
> > case to make sure it still works as expected (if you haven't already)."
> > 
> > So I tested the vanilla/after upstream kernels with compiled xfsprogs-v4.3.0,
> > which was before mkfs fix 20fbd4593ff2 ("libxfs: format the log with
> > valid log record headers") got merged, and I generated a questionable
> > image followed by the instruction described in the related commit
> > messages with "mkfs.xfs -dsunit=512,swidth=4096 -f test.img" and
> > logprint says
> > 
> > cycle: 1        version: 2              lsn: 1,0        tail_lsn: 1,0
> > length of Log Record: 261632    prev offset: -1         num ops: 1
> > uuid: 7b84cd80-7855-4dc8-91ce-542c7d65ba99   format: little endian linux
> > h_size: 32768
> > 
> > so "length of Log Record" is overrun obviously, but I cannot reproduce
> > the described workaround case for vanilla/after kernels anymore.
> > 
> > I think the reason is due to commit 7f6aff3a29b0 ("xfs: only run torn
> > log write detection on dirty logs"), which changes the behavior
> > described in commit a70f9fe52daa8 ("xfs: detect and handle invalid
> > iclog size set by mkfs") from "all records at the head of the log
> > are verified for CRC errors" to "we can only run CRC verification
> > when the log is dirty because there's no guarantee that the log
> > data behind an unmount record is compatible with the current
> > architecture).", more details see codediff of a70f9fe52daa8.
> > 
> 
> If I follow correctly, you're saying that prior to commit 7f6aff3a29b0,
> log recovery would run a CRC pass on a clean log (with an unmount
> record) and this is where the old workaround code would kick in if the
> filesystem happened to be misformatted by mkfs. After said commit, the
> CRC pass is no longer run unless the log is dirty (for arch
> compatibility reasons), so we fall into the xlog_check_unmount_rec()
> path that does some careful (presumably arch agnostic) detection of a
> clean/dirty log based on whether the record just behind the head has a
> single unmount transaction. This function already uses h_len properly
> and only reads a single log block to determine whether the target is an
> unmount record, so doesn't have the same overflow risk as a full
> recovery pass.
> 
> Am I following that correctly? If so, the patch otherwise looks
> reasonable to me:

Yeah, that is what I was trying to say. Thanks for the review!

Thanks,
Gao Xiang

> 
> Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> 
> > The timeline seems to be:
> >  https://lore.kernel.org/r/1447342648-40012-1-git-send-email-bfoster@xxxxxxxxxx
> >  a70f9fe52daa [PATCH v2 1/8] xfs: detect and handle invalid iclog size set by mkfs
> >  7088c4136fa1 [PATCH v2 7/8] xfs: detect and trim torn writes during log recovery
> >  https://lore.kernel.org/r/1457008798-58734-5-git-send-email-bfoster@xxxxxxxxxx
> >  7f6aff3a29b0 [PATCH 4/4] xfs: only run torn log write detection on dirty logs
> > 
> > so IMHO, the workaround a70f9fe52daa would only be useful between
> > 7088c4136fa1 ~ 7f6aff3a29b0.
> > 
> > Yeah, it relates to several old kernel commits/versions, my technical
> > analysis is as above. This patch doesn't actually change the real
> > original workaround logic. Even if the workground can be removed now,
> > that should be addressed with another patch and that is quite another
> > story.
> > 
> > Kindly correct me if I'm wrong :-)
> > 
> > Thanks,
> > Gao Xiang