[Bug 215783] kernel NULL pointer dereference and general protection fault in fs/xfs/xfs_buf_item_recover.c: xlog_recover_do_reg_buffer() when mount a corrupted image

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=215783

--- Comment #2 from Dave Chinner (david@xxxxxxxxxxxxx) ---
On Fri, Apr 01, 2022 at 08:35:39AM +1100, Dave Chinner wrote:
> On Thu, Mar 31, 2022 at 08:07:08PM +0000, bugzilla-daemon@xxxxxxxxxx wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215783
> > - Overview 
> > kernel NULL pointer dereference and general protection fault in
> > fs/xfs/xfs_buf_item_recover.c:xlog_recover_do_reg_buffer() when mount a
> > corrupted image, sometimes cause kernel hang
> > 
> > - Reproduce 
> > tested on kernel 5.17.1, 5.15.32
> > 
> > $ mkdir mnt
> > $ unzip tmp7.zip
> > $ ./mount.sh xfs 7  ##NULL pointer derefence
> > or
> > $ sudo mount -t xfs tmp7.img mnt ##general protection fault
> > 
> > - Kernel dump
> 
> You've now raised 4 bugs that all look very similar and are quite
> possibly all caused by the same corruption vector.
> Please do some triage on the failure to identify the
> source of the corruption that trigger this failure.

Ok, the log has been intentionally corrupted in a way that does not
happen in the real world. i.e.  The iclog header at the tail of the
log has had the CRC zeroed, so CRC checking for media bit corruption
has been intentionally bypassed by the tool that corrupted the log.

The first item is a superblock buffer item, which contains 2
regions; a buf log item and a 384 byte long region containing the
logged superblock data.

However, the buf log item has been screwed with to say that it has 8
regions rather than 2, and so when recovery goes to recovery the
third region that doesn't exist, it falls off the end of the
allocated transaction buffer.

We only ever write iclogs with CRCs in them (except for mkfs when it
writes an unmount record to intialise the log), so bit corruptions
like this will get caught before we even started log recovery in
production systems.

We've got enough issues with actual log recovery bugs that we don't
need to be overloaded by being forced to play whack-a-mole with
malicious corruptions that *will not happen in the real world*
because "security!".

Looking at the crash locations for the other bugs, they are all
going to be the same thing - you've corrupted the vector index in
the log item and so they all fall off the end of the buffer because
the index no longer matches the actual contents of the log item.

vvvv THIS vvvv

> If you are going to run some scripted tool to randomly corrupt the
> filesystem to find failures, then you have an ethical and moral
> responsibility to do some of the work to narrow down and identify
> the cause of the failure, not just throw them at someone to do all
> the work.

^^^^ THIS ^^^^^

Please confirm your other reports have the same root cause and close
them if they are. If not, please point us to the unique corruption
in the log that causes the failure.

-Dave.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux