Thanks for the report Murphy. I just realized we have a collision of mount flags: #define EXT4_MOUNT2_DAX_NEVER 0x00000008 /* Do not allow Direct Access */ #define EXT4_MOUNT2_DAX_INODE 0x00000010 /* For printing options only */ #define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM 0x00000008 /* User explicitly specified journal checksum */ #define EXT4_MOUNT2_JOURNAL_FAST_COMMIT 0x00000010 /* Journal fast commit */ So whenever dax=inode is turned on, fast commit also gets turned on. I'll send out a fix for this soon. Thanks, Harshad On Tue, Nov 10, 2020 at 7:10 PM Murphy Zhou <jencce.kernel@xxxxxxxxx> wrote: > > Hi, > > A corner case panic starts since the fast-commit merge. > > ------ > kernel BUG at fs/ext4/ext4_extents.h:199! > invalid opcode: 0000 [#1] SMP PTI > CPU: 5 PID: 1184 Comm: t_mmap_dio Tainted: G E 5.10.0-rc3-master-407ab579637c+ #37 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 > RIP: 0010:ext4_fc_write_inode_data+0x1d0/0x1e0 [ext4] > Code: 7f 00 00 74 25 66 81 ca 00 80 66 89 54 24 30 e9 62 ff ff ff 4c 89 ff e8 9e 63 4d c1 31 c0 eb 84 b8 83 ff ff ff e9 7a ff ff ff <0f> 0b e8 79 c8 4c c1 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 > RSP: 0018:ffff9bc200d53d88 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000008000 > RDX: 0000000000008000 RSI: 0000000000060002 RDI: 000032ee98c0aa2c > RBP: 0000000000008000 R08: ffffffffc0559158 R09: ffffffffc054cdc0 > R10: 0000000000008000 R11: 0000000000000000 R12: 0000000000007fff > R13: ffff9bc200d53e6c R14: ffff88d2e0cf3420 R15: ffff88d2e0cf33a8 > FS: 00007f20006cab80(0000) GS:ffff88d367140000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f2000937000 CR3: 00000003db940002 CR4: 0000000000020ee0 > Call Trace: > ext4_fc_commit+0x874/0x900 [ext4] > ? file_check_and_advance_wb_err+0x2e/0xc0 > ext4_sync_file+0xd4/0x350 [ext4] > __x64_sys_fsync+0x34/0x60 > do_syscall_64+0x33/0x40 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7f20008d1ef7 > ------ > > Bisect result shows the first bad commit: > [aa75f4d3daaeb1389b9cce9d6b84401eaf228d4e] ext4: main fast-commit commit path > > It's hit by running xfstests generic/605 on pmem ramdisk, fsdax enabled. > I've narrowed down to a simplified reproducer, which requires a pmem ramdisk > been setup first, eg /dev/pmem10. And /root/xfstests-dev is built. > > --------------------- Reproducer start ----------- > #! /bin/bash > SCRATCH_DEV=/dev/pmem10 > SCRATCH_MNT=/daxmnt > SRC_FILE=$SCRATCH_MNT/src > DST_FILE=$SCRATCH_MNT/dst > XFS_IO_PROG=/usr/sbin/xfs_io > tsize=$((128 * 1024 * 1024)) > wipefs -af $SCRATCH_DEV > out.full 2>&1 > mkfs.ext4 -b 4096 $SCRATCH_DEV >> out.full 2>&1 || exit > mkdir -p $SCRATCH_MNT > mount -t ext4 -o dax=inode,context=system_u:object_r:nfs_t:s0 $SCRATCH_DEV $SCRATCH_MNT || exit > prep_files() > { > rm -f $SRC_FILE $DST_FILE > $XFS_IO_PROG -f -c "falloc 0 $tsize" $SRC_FILE $DST_FILE >> out.full 2>&1 || exit > } > prep_files > # with O_DIRECT first > /root/xfstests-dev/src/t_mmap_dio $SRC_FILE $DST_FILE 1024 "dio both dax" && echo pass dio > prep_files > # again with buffered IO > /root/xfstests-dev/src/t_mmap_dio -b $SRC_FILE $DST_FILE 1024 "buffered both dax" && echo pass bio > umount /daxmnt > --------------------- Reproducer end ----------- > > The "context=xx" and "dax=inode" mount option are *necessary* to reproduce. > > NO panic reproduced if mount with "-o dax" or "dax=always" or "dax=never". > > The context value seems irrelevant. Other value can also trigger this panic. > The inode size value seems irrelevant. Tested with 128 and 256, both panic. > > Some gdb info and fs info paste below for you ref. > > Thanks, > Murphy > > ------------ gdb info ----------- > (gdb) l *(ext4_fc_write_inode_data+0x1d0/0x1e0) > 0x64fd0 is in ext4_fc_write_inode_data (fs/ext4/fast_commit.c:787). > 782 /* > 783 * Writes updated data ranges for the inode in question. Updates CRC. > 784 * Returns 0 on success, error otherwise. > 785 */ > 786 static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc) > 787 { > 788 ext4_lblk_t old_blk_size, cur_lblk_off, new_blk_size; > 789 struct ext4_inode_info *ei = EXT4_I(inode); > 790 struct ext4_map_blocks map; > 791 struct ext4_fc_add_range fc_ext; > (gdb) l *(ext4_fc_write_inode_data+0x1d0) > 0x651a0 is in ext4_fc_write_inode_data (fs/ext4/ext4_extents.h:199). > 194 } > 195 > 196 static inline void ext4_ext_mark_unwritten(struct ext4_extent *ext) > 197 { > 198 /* We can not have an unwritten extent of zero length! */ > 199 BUG_ON((le16_to_cpu(ext->ee_len) & ~EXT_INIT_MAX_LEN) == 0); > 200 ext->ee_len |= cpu_to_le16(EXT_INIT_MAX_LEN); > 201 } > 202 > 203 static inline int ext4_ext_is_unwritten(struct ext4_extent *ext) > (gdb) l *(ext4_fc_commit+0x874/0x900) > 0x67460 is in ext4_fc_commit (fs/ext4/fast_commit.c:1072). > 1067 * commit_tid if needed. If it's not possible to perform a fast commit > 1068 * due to various reasons, we fall back to full commit. Returns 0 > 1069 * on success, error otherwise. > 1070 */ > 1071 int ext4_fc_commit(journal_t *journal, tid_t commit_tid) > 1072 { > 1073 struct super_block *sb = (struct super_block *)(journal->j_private); > 1074 struct ext4_sb_info *sbi = EXT4_SB(sb); > 1075 int nblks = 0, ret, bsize = journal->j_blocksize; > 1076 int subtid = atomic_read(&sbi->s_fc_subtid); > (gdb) l *(ext4_fc_commit+0x874) > 0x67cd4 is in ext4_fc_commit (fs/ext4/fast_commit.c:972). > 967 */ > 968 ret = ext4_fc_write_inode(inode, crc); > 969 if (ret) > 970 goto lock_and_exit; > 971 > 972 ret = ext4_fc_write_inode_data(inode, crc); > 973 if (ret) > 974 goto lock_and_exit; > 975 > 976 if (!ext4_fc_add_dentry_tlv( > (gdb) > -------------gdb info end ------------ > > --------------- tune2fs -l outout -------- > tune2fs 1.45.5 (07-Jan-2020) > Filesystem volume name: <none> > Last mounted on: <not available> > Filesystem UUID: 281dc870-a1ac-4f0e-85ec-e5cecd3d0f88 > Filesystem magic number: 0xEF53 > Filesystem revision #: 1 (dynamic) > Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum > Filesystem flags: signed_directory_hash > Default mount options: user_xattr acl > Filesystem state: clean > Errors behavior: Continue > Filesystem OS type: Linux > Inode count: 655360 > Block count: 2621440 > Reserved block count: 131072 > Free blocks: 2554687 > Free inodes: 655349 > First block: 0 > Block size: 4096 > Fragment size: 4096 > Group descriptor size: 64 > Reserved GDT blocks: 1024 > Blocks per group: 32768 > Fragments per group: 32768 > Inodes per group: 8192 > Inode blocks per group: 512 > Flex block group size: 16 > Filesystem created: Wed Nov 11 10:20:53 2020 > Last mount time: Wed Nov 11 10:20:53 2020 > Last write time: Wed Nov 11 10:20:53 2020 > Mount count: 1 > Maximum mount count: -1 > Last checked: Wed Nov 11 10:20:53 2020 > Check interval: 0 (<none>) > Lifetime writes: 4137 kB > Reserved blocks uid: 0 (user root) > Reserved blocks gid: 0 (group root) > First inode: 11 > Inode size: 256 > Required extra isize: 32 > Desired extra isize: 32 > Journal inode: 8 > Default directory hash: half_md4 > Directory Hash Seed: b7a0f1fd-a3fa-4669-ab23-3d63df2b16ef > Journal backup: inode blocks > Checksum type: crc32c > Checksum: 0x555540b3 > --------------- tune2fs -l output end -------- > > --------------- mount info ---------- > /dev/pmem10 on /daxmnt type ext4 (rw,relatime,context=system_u:object_r:nfs_t:s0,dax=inode,fast_commit) > --------------- mount info end ---------- > -- > Murphy