fsdax memory error handling regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Willy,

I'm seeing the following warning with v4.20-rc1 and the +ACI-dax.sh+ACI- test
from the ndctl repository:

+AFs-   69.962873+AF0- EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
+AFs-   69.969522+AF0- EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: dax
+AFs-   70.028571+AF0- Injecting memory failure for pfn 0x208900 at process virtual address 0x7efe87b00000
+AFs-   70.032384+AF0- Memory failure: 0x208900: Killing dax-pmd:7066 due to hardware memory corruption
+AFs-   70.034420+AF0- Memory failure: 0x208900: recovery action for dax page: Recovered
+AFs-   70.038878+AF0- WARNING: CPU: 37 PID: 7066 at fs/dax.c:464 dax+AF8-insert+AF8-entry+-0x30b/0x330
+AFs-   70.040675+AF0- Modules linked in: ebtable+AF8-nat(E) ebtable+AF8-broute(E) bridge(E) stp(E) llc(E) ip6table+AF8-mangle(E) ip6table+AF8-raw(E) ip6table+AF8-security(E) iptable+AF8-mangle(E) iptable+AF8-raw(E) iptable+AF8-security(E) nf+AF8-conntrack(E) nf+AF8-defrag+AF8-ipv6(E) nf+AF8-defrag+AF8-ipv4(E) ebtable+AF8-filter(E) ebtables(E) ip6table+AF8-filter(E) ip6+AF8-tables(E) crct10dif+AF8-pclmul(E) crc32+AF8-pclmul(E) dax+AF8-pmem(OE) crc32c+AF8-intel(E) device+AF8-dax(OE) ghash+AF8-clmulni+AF8-intel(E) nd+AF8-pmem(OE) nd+AF8-btt(OE) serio+AF8-raw(E) nd+AF8-e820(OE) nfit(OE) libnvdimm(OE) nfit+AF8-test+AF8-iomap(OE)
+AFs-   70.049936+AF0- CPU: 37 PID: 7066 Comm: dax-pmd Tainted: G           OE     4.19.0-rc5+- +ACM-2589
+AFs-   70.051726+AF0- Hardware name: QEMU Standard PC (i440FX +- PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
+AFs-   70.055215+AF0- RIP: 0010:dax+AF8-insert+AF8-entry+-0x30b/0x330
+AFs-   70.056769+AF0- Code: 84 b7 fe ff ff 48 81 e6 00 00 e0 ff e9 b2 fe ff ff 48 8b 3c 24 48 89 ee 31 d2 e8 10 eb ff ff 49 8b 7d 00 31 f6 e9 99 fe ff ff +ADw-0f+AD4- 0b e9 f8 fe ff ff 0f 0b e9 e2 fd ff ff e8 82 f1 f4 ff e9 9c fe
+AFs-   70.062086+AF0- RSP: 0000:ffffc900086bfb20 EFLAGS: 00010082
+AFs-   70.063726+AF0- RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffea0008220000
+AFs-   70.065755+AF0- RDX: 0000000000000000 RSI: 0000000000208800 RDI: 0000000000208800
+AFs-   70.067784+AF0- RBP: ffff880327870bb0 R08: 0000000000208801 R09: 0000000000208a00
+AFs-   70.069813+AF0- R10: 0000000000208801 R11: 0000000000000001 R12: ffff880327870bb8
+AFs-   70.071837+AF0- R13: 0000000000000000 R14: 0000000004110003 R15: 0000000000000009
+AFs-   70.073867+AF0- FS:  00007efe8859d540(0000) GS:ffff88033ea80000(0000) knlGS:0000000000000000
+AFs-   70.076547+AF0- CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+AFs-   70.078294+AF0- CR2: 00007efe87a00000 CR3: 0000000334564003 CR4: 0000000000160ee0
+AFs-   70.080326+AF0- Call Trace:
+AFs-   70.081404+AF0-  ? dax+AF8-iomap+AF8-pfn+-0xb4/0x100
+AFs-   70.082770+AF0-  dax+AF8-iomap+AF8-pte+AF8-fault+-0x648/0xd60
+AFs-   70.084222+AF0-  dax+AF8-iomap+AF8-fault+-0x230/0xba0
+AFs-   70.085596+AF0-  ? lock+AF8-acquire+-0x9e/0x1a0
+AFs-   70.086940+AF0-  ? ext4+AF8-dax+AF8-huge+AF8-fault+-0x5e/0x200
+AFs-   70.088406+AF0-  ext4+AF8-dax+AF8-huge+AF8-fault+-0x78/0x200
+AFs-   70.089840+AF0-  ? up+AF8-read+-0x1c/0x70
+AFs-   70.091071+AF0-  +AF8AXw-do+AF8-fault+-0x1f/0x136
+AFs-   70.092344+AF0-  +AF8AXw-handle+AF8-mm+AF8-fault+-0xd2b/0x11c0
+AFs-   70.093790+AF0-  handle+AF8-mm+AF8-fault+-0x198/0x3a0
+AFs-   70.095166+AF0-  +AF8AXw-do+AF8-page+AF8-fault+-0x279/0x510
+AFs-   70.096546+AF0-  do+AF8-page+AF8-fault+-0x32/0x200
+AFs-   70.097884+AF0-  ? async+AF8-page+AF8-fault+-0x8/0x30
+AFs-   70.099256+AF0-  async+AF8-page+AF8-fault+-0x1e/0x30

I tried to get this test going on -next before the merge window, but
-next was not bootable for me. Bisection points to:

    9f32d221301c dax: Convert dax+AF8-lock+AF8-mapping+AF8-entry to XArray

At first glance I think we need the old +ACI-always retry if we slept+ACI-
behavior. Otherwise this failure seems similar to the issue fixed by
Ross' change to always retry on any potential collision:

    b1f382178d15 ext4: close race between direct IO and ext4+AF8-break+AF8-layouts()

I'll take a closer look tomorrow to see if that guess is plausible.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux