Hi Willy, I'm seeing the following warning with v4.20-rc1 and the +ACI-dax.sh+ACI- test from the ndctl repository: +AFs- 69.962873+AF0- EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk +AFs- 69.969522+AF0- EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: dax +AFs- 70.028571+AF0- Injecting memory failure for pfn 0x208900 at process virtual address 0x7efe87b00000 +AFs- 70.032384+AF0- Memory failure: 0x208900: Killing dax-pmd:7066 due to hardware memory corruption +AFs- 70.034420+AF0- Memory failure: 0x208900: recovery action for dax page: Recovered +AFs- 70.038878+AF0- WARNING: CPU: 37 PID: 7066 at fs/dax.c:464 dax+AF8-insert+AF8-entry+-0x30b/0x330 +AFs- 70.040675+AF0- Modules linked in: ebtable+AF8-nat(E) ebtable+AF8-broute(E) bridge(E) stp(E) llc(E) ip6table+AF8-mangle(E) ip6table+AF8-raw(E) ip6table+AF8-security(E) iptable+AF8-mangle(E) iptable+AF8-raw(E) iptable+AF8-security(E) nf+AF8-conntrack(E) nf+AF8-defrag+AF8-ipv6(E) nf+AF8-defrag+AF8-ipv4(E) ebtable+AF8-filter(E) ebtables(E) ip6table+AF8-filter(E) ip6+AF8-tables(E) crct10dif+AF8-pclmul(E) crc32+AF8-pclmul(E) dax+AF8-pmem(OE) crc32c+AF8-intel(E) device+AF8-dax(OE) ghash+AF8-clmulni+AF8-intel(E) nd+AF8-pmem(OE) nd+AF8-btt(OE) serio+AF8-raw(E) nd+AF8-e820(OE) nfit(OE) libnvdimm(OE) nfit+AF8-test+AF8-iomap(OE) +AFs- 70.049936+AF0- CPU: 37 PID: 7066 Comm: dax-pmd Tainted: G OE 4.19.0-rc5+- +ACM-2589 +AFs- 70.051726+AF0- Hardware name: QEMU Standard PC (i440FX +- PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014 +AFs- 70.055215+AF0- RIP: 0010:dax+AF8-insert+AF8-entry+-0x30b/0x330 +AFs- 70.056769+AF0- Code: 84 b7 fe ff ff 48 81 e6 00 00 e0 ff e9 b2 fe ff ff 48 8b 3c 24 48 89 ee 31 d2 e8 10 eb ff ff 49 8b 7d 00 31 f6 e9 99 fe ff ff +ADw-0f+AD4- 0b e9 f8 fe ff ff 0f 0b e9 e2 fd ff ff e8 82 f1 f4 ff e9 9c fe +AFs- 70.062086+AF0- RSP: 0000:ffffc900086bfb20 EFLAGS: 00010082 +AFs- 70.063726+AF0- RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffea0008220000 +AFs- 70.065755+AF0- RDX: 0000000000000000 RSI: 0000000000208800 RDI: 0000000000208800 +AFs- 70.067784+AF0- RBP: ffff880327870bb0 R08: 0000000000208801 R09: 0000000000208a00 +AFs- 70.069813+AF0- R10: 0000000000208801 R11: 0000000000000001 R12: ffff880327870bb8 +AFs- 70.071837+AF0- R13: 0000000000000000 R14: 0000000004110003 R15: 0000000000000009 +AFs- 70.073867+AF0- FS: 00007efe8859d540(0000) GS:ffff88033ea80000(0000) knlGS:0000000000000000 +AFs- 70.076547+AF0- CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 +AFs- 70.078294+AF0- CR2: 00007efe87a00000 CR3: 0000000334564003 CR4: 0000000000160ee0 +AFs- 70.080326+AF0- Call Trace: +AFs- 70.081404+AF0- ? dax+AF8-iomap+AF8-pfn+-0xb4/0x100 +AFs- 70.082770+AF0- dax+AF8-iomap+AF8-pte+AF8-fault+-0x648/0xd60 +AFs- 70.084222+AF0- dax+AF8-iomap+AF8-fault+-0x230/0xba0 +AFs- 70.085596+AF0- ? lock+AF8-acquire+-0x9e/0x1a0 +AFs- 70.086940+AF0- ? ext4+AF8-dax+AF8-huge+AF8-fault+-0x5e/0x200 +AFs- 70.088406+AF0- ext4+AF8-dax+AF8-huge+AF8-fault+-0x78/0x200 +AFs- 70.089840+AF0- ? up+AF8-read+-0x1c/0x70 +AFs- 70.091071+AF0- +AF8AXw-do+AF8-fault+-0x1f/0x136 +AFs- 70.092344+AF0- +AF8AXw-handle+AF8-mm+AF8-fault+-0xd2b/0x11c0 +AFs- 70.093790+AF0- handle+AF8-mm+AF8-fault+-0x198/0x3a0 +AFs- 70.095166+AF0- +AF8AXw-do+AF8-page+AF8-fault+-0x279/0x510 +AFs- 70.096546+AF0- do+AF8-page+AF8-fault+-0x32/0x200 +AFs- 70.097884+AF0- ? async+AF8-page+AF8-fault+-0x8/0x30 +AFs- 70.099256+AF0- async+AF8-page+AF8-fault+-0x1e/0x30 I tried to get this test going on -next before the merge window, but -next was not bootable for me. Bisection points to: 9f32d221301c dax: Convert dax+AF8-lock+AF8-mapping+AF8-entry to XArray At first glance I think we need the old +ACI-always retry if we slept+ACI- behavior. Otherwise this failure seems similar to the issue fixed by Ross' change to always retry on any potential collision: b1f382178d15 ext4: close race between direct IO and ext4+AF8-break+AF8-layouts() I'll take a closer look tomorrow to see if that guess is plausible.