在 2022/10/24 6:00, Dave Chinner 写道: > On Fri, Oct 21, 2022 at 07:11:02PM -0700, Darrick J. Wong wrote: >> On Thu, Oct 20, 2022 at 10:17:45PM +0800, Yang, Xiao/杨 晓 wrote: >>> In addition, I don't like your idea about the test change because it will >>> make generic/470 become the special test for XFS. Do you know if we can fix >>> the issue by changing the test in another way? blkdiscard -z can fix the >>> issue because it does zero-fill rather than discard on the block device. >>> However, blkdiscard -z will take a lot of time when the block device is >>> large. >> >> Well we /could/ just do that too, but that will suck if you have 2TB of >> pmem. ;) >> >> Maybe as an alternative path we could just create a very small >> filesystem on the pmem and then blkdiscard -z it? >> >> That said -- does persistent memory actually have a future? Intel >> scuttled the entire Optane product, cxl.mem sounds like expansion >> chassis full of DRAM, and fsdax is horribly broken in 6.0 (weird kernel >> asserts everywhere) and 6.1 (every time I run fstests now I see massive >> data corruption). > > Yup, I see the same thing. fsdax was a train wreck in 6.0 - broken > on both ext4 and XFS. Now that I run a quick check on 6.1-rc1, I > don't think that has changed at all - I still see lots of kernel > warnings, data corruption and "XFS_IOC_CLONE_RANGE: Invalid > argument" errors. Firstly, I think the "XFS_IOC_CLONE_RANGE: Invalid argument" error is caused by the restrictions which prevent reflink work together with DAX: a. fs/xfs/xfs_ioctl.c:1141 /* Don't allow us to set DAX mode for a reflinked file for now. */ if ((fa->fsx_xflags & FS_XFLAG_DAX) && xfs_is_reflink_inode(ip)) return -EINVAL; b. fs/xfs/xfs_iops.c:1174 /* Only supported on non-reflinked files. */ if (xfs_is_reflink_inode(ip)) return false; These restrictions were removed in "drop experimental warning" patch[1]. I think they should be separated from that patch. [1] https://lore.kernel.org/linux-xfs/1663234002-17-1-git-send-email-ruansy.fnst@xxxxxxxxxxx/ Secondly, how the data corruption happened? Or which case failed? Could you give me more info (such as mkfs options, xfstests configs)? > > If I turn off reflink, then instead of data corruption I get kernel > warnings like this from fsx and fsstress workloads: > > [415478.558426] ------------[ cut here ]------------ > [415478.560548] WARNING: CPU: 12 PID: 1515260 at fs/dax.c:380 dax_insert_entry+0x2a5/0x320 > [415478.564028] Modules linked in: > [415478.565488] CPU: 12 PID: 1515260 Comm: fsx Tainted: G W 6.1.0-rc1-dgc+ #1615 > [415478.569221] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > [415478.572876] RIP: 0010:dax_insert_entry+0x2a5/0x320 > [415478.574980] Code: 08 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 58 20 48 8d 53 01 e9 65 ff ff ff 48 8b 58 20 48 8d 53 01 e9 50 ff ff ff <0f> 0b e9 70 ff ff ff 31 f6 4c 89 e7 e8 da ee a7 00 eb a4 48 81 e6 > [415478.582740] RSP: 0000:ffffc90002867b70 EFLAGS: 00010002 > [415478.584730] RAX: ffffea000f0d0800 RBX: 0000000000000001 RCX: 0000000000000001 > [415478.587487] RDX: ffffea0000000000 RSI: 000000000000003a RDI: ffffea000f0d0840 > [415478.590122] RBP: 0000000000000011 R08: 0000000000000000 R09: 0000000000000000 > [415478.592380] R10: ffff888800dc9c18 R11: 0000000000000001 R12: ffffc90002867c58 > [415478.594865] R13: ffff888800dc9c18 R14: ffffc90002867e18 R15: 0000000000000000 > [415478.596983] FS: 00007fd719fa2b80(0000) GS:ffff88883ec00000(0000) knlGS:0000000000000000 > [415478.599364] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [415478.600905] CR2: 00007fd71a1ad640 CR3: 00000005cf241006 CR4: 0000000000060ee0 > [415478.602883] Call Trace: > [415478.603598] <TASK> > [415478.604229] dax_fault_iter+0x240/0x600 > [415478.605410] dax_iomap_pte_fault+0x19c/0x3d0 > [415478.606706] __xfs_filemap_fault+0x1dd/0x2b0 > [415478.607744] __do_fault+0x2e/0x1d0 > [415478.608587] __handle_mm_fault+0xcec/0x17b0 > [415478.609593] handle_mm_fault+0xd0/0x2a0 > [415478.610517] exc_page_fault+0x1d9/0x810 > [415478.611398] asm_exc_page_fault+0x22/0x30 > [415478.612311] RIP: 0033:0x7fd71a04b9ba > [415478.613168] Code: 4d 29 c1 4c 29 c2 48 3b 15 db 95 11 00 0f 87 af 00 00 00 0f 10 01 0f 10 49 f0 0f 10 51 e0 0f 10 59 d0 48 83 e9 40 48 83 ea 40 <41> 0f 29 01 41 0f 29 49 f0 41 0f 29 51 e0 41 0f 29 59 d0 49 83 e9 > [415478.617083] RSP: 002b:00007ffcf277be18 EFLAGS: 00010206 > [415478.618213] RAX: 00007fd71a1a3fc5 RBX: 0000000000000fc5 RCX: 00007fd719f5a610 > [415478.619854] RDX: 000000000000964b RSI: 00007fd719f50fd5 RDI: 00007fd71a1a3fc5 > [415478.621286] RBP: 0000000000030fc5 R08: 000000000000000e R09: 00007fd71a1ad640 > [415478.622730] R10: 0000000000000001 R11: 00007fd71a1ad64e R12: 0000000000009699 > [415478.624164] R13: 000000000000a65e R14: 00007fd71a1a3000 R15: 0000000000000001 > [415478.625600] </TASK> > [415478.626087] ---[ end trace 0000000000000000 ]--- > > Even generic/247 is generating a warning like this from xfs_io, > which is a mmap vs DIO racer. Given that DIO doesn't exist for > fsdax, this test turns into just a normal write() vs mmap() racer. > > Given these are the same fsdax infrastructure failures that I > reported for 6.0, it is also likely that ext4 is still throwing > them. IOWs, whatever got broke in the 6.0 cycle wasn't fixed in the > 6.1 cycle. Still working on it... > >> Frankly at this point I'm tempted just to turn of fsdax support for XFS >> for the 6.1 LTS because I don't have time to fix it. > > /me shrugs > > Backporting fixes (whenever they come along) is a problem for the > LTS kernel maintainer to deal with, not the upstream maintainer. > > IMO, the issue right now is that the DAX maintainers seem to have > little interest in ensuring that the FSDAX infrastructure actually > works correctly. If anything, they seem to want to make things > harder for block based filesystems to use pmem devices and hence > FSDAX. e.g. the direction of the DAX core away from block interfaces > that filesystems need for their userspace tools to manage the > storage. > > At what point do we simply say "the experiment failed, FSDAX is > dead" and remove it from XFS altogether? I'll hurry up and try my best to solve these problems. -- Thanks, Ruan. > > Cheers, > > Dave.