On Fri, Apr 01, 2022 at 08:35:39AM +1100, Dave Chinner wrote: > On Thu, Mar 31, 2022 at 08:07:08PM +0000, bugzilla-daemon@xxxxxxxxxx wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=215783 > > - Overview > > kernel NULL pointer dereference and general protection fault in > > fs/xfs/xfs_buf_item_recover.c:xlog_recover_do_reg_buffer() when mount a > > corrupted image, sometimes cause kernel hang > > > > - Reproduce > > tested on kernel 5.17.1, 5.15.32 > > > > $ mkdir mnt > > $ unzip tmp7.zip > > $ ./mount.sh xfs 7 ##NULL pointer derefence > > or > > $ sudo mount -t xfs tmp7.img mnt ##general protection fault > > > > - Kernel dump > > You've now raised 4 bugs that all look very similar and are quite > possibly all caused by the same corruption vector. > Please do some triage on the failure to identify the > source of the corruption that trigger this failure. Ok, the log has been intentionally corrupted in a way that does not happen in the real world. i.e. The iclog header at the tail of the log has had the CRC zeroed, so CRC checking for media bit corruption has been intentionally bypassed by the tool that corrupted the log. The first item is a superblock buffer item, which contains 2 regions; a buf log item and a 384 byte long region containing the logged superblock data. However, the buf log item has been screwed with to say that it has 8 regions rather than 2, and so when recovery goes to recovery the third region that doesn't exist, it falls off the end of the allocated transaction buffer. We only ever write iclogs with CRCs in them (except for mkfs when it writes an unmount record to intialise the log), so bit corruptions like this will get caught before we even started log recovery in production systems. We've got enough issues with actual log recovery bugs that we don't need to be overloaded by being forced to play whack-a-mole with malicious corruptions that *will not happen in the real world* because "security!". Looking at the crash locations for the other bugs, they are all going to be the same thing - you've corrupted the vector index in the log item and so they all fall off the end of the buffer because the index no longer matches the actual contents of the log item. vvvv THIS vvvv > If you are going to run some scripted tool to randomly corrupt the > filesystem to find failures, then you have an ethical and moral > responsibility to do some of the work to narrow down and identify > the cause of the failure, not just throw them at someone to do all > the work. ^^^^ THIS ^^^^^ Please confirm your other reports have the same root cause and close them if they are. If not, please point us to the unique corruption in the log that causes the failure. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx