Re: PROBLEM: another potential concurrency bug in swap_inode_boot_loader()

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Tue, 8 Sep 2020 19:44:50 -0700

On Wed, Sep 09, 2020 at 12:28:36AM +0000, Gong, Sishuai wrote:
> Hi,
> 
> We found a potential concurrency bug in linux kernel 5.3.11. We were able to reproduce this bug in x86 under specific thread interleavings. This bug causes a “bad header/extent” EXT4-fs error. 
> 
> In addition, we think this bug may be related to another bug we reported earlier. Similar to a concern mentioned in your reply, this time the inode had a correct checksum but a wrong header data.
> 
> https://lore.kernel.org/linux-ext4/459EE6E3-1CB2-4898-8C5F-283E821B2A75@xxxxxxxxx/T/#t
> 
> 
> ------------------------------------------
> Kernel console output
> 
> EXT4-fs error (device sda1): ext4_ext_check_inode:498: inode #5: comm ski-executor: pblk 0 bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
> 
> ------------------------------------------
> Test input
> 
> This bug occurs when a kernel test program is executed twice in different threads and ran concurrently. Our analysis has located that it happens when syscall ioctl with the EXT4_IOC_SWAP_BOOT flag is called twice and interleaves with itself. 
> The test program is generated by Syzkaller as follows:
> r0 = creat(&(0x7f0000000080)='./file0\x00', 0x0)
> ioctl$FS_IOC_SETFLAGS(r0, 0x40046602, &(0x7f0000000040)) 
> r1 = creat(&(0x7f0000000000)='./file0\x00', 0x0)
> pwrite64(r1, &(0x7f00000000c0)='\x00', 0x1, 0x1010000)
> r2 = creat(&(0x7f0000000000)='./file0\x00', 0x0)
> ioctl$EXT4_IOC_SWAP_BOOT(r2, 0x6611)
> 
> ------------------------------------------
> Thread interleaving
> 
> Our analysis revealed that the following interleaving triggers this bug.
> 
> CPU0																CPU1
> swap_inode_boot_loader()
> …
> -- ext4_mark_inode_dirty() [fs/ext4/ioctl.c:207]
> [context switch]
> 																	swap_inode_boot_loader()
> 																	-- ext4_iget()
> 																	---- ext4_isize()
> 																	[context switch]			

How do you end up in this state?  CPU0 has already ext4_iget()'d a
reference to the bootloader inode, right?  Which means that I_NEW is no
longer set on the incore inode, right, because we clear that flag when
we unlock the inode.i_lock at the end of the iget function.  So
shouldn't CPU1's call to ext4_iget to get the same bootloader inode end
up with the same incore inode?  And won't I_NEW be clear by then?

--D

> …
> -- ext4_mark_inode_dirty() [fs/ext4/ioctl.c:223]
> ---- ext4_mark_iloc_dirty()
> ------ ext4_do_update_inode()
>           for (block = 0; block < EXT4_N_BLOCKS; block++) [fs/ext4/inode.c:5337]
>             raw_inode->i_block[block] = ei->i_data[block];
> …
> [syscall finishes]
> [context switch]
> 																	…
> 											        						for (block = 0; block < EXT4_N_BLOCKS; block++) [fs/ext4/inode.c:5002]
> 																	          ei->i_data[block] = raw_inode->i_block[block];
> 																	…
> 																	---- ext4_ext_check_inode(inode)
> 																	[EXT4-fs error]				
> 
> 
> Thanks,
> Sishuai
>