I have spent the last few days attempting to recovery my system. I built a raid 5 array with an external journal residing on a 32G Optane and I had echo write-back > /sys/block/md0/md/journal_mode an /etc/local.d script. I had either a power failure or hardware glitch and rebooted abnormally. Following this I was unable to recover the journal. With 4.14, it hung with "Cannot get enough stripes due to memory pressure. Recovery failed." exactly as described here: https://www.spinics.net/lists/raid/msg59856.html [ 250.985001] md: md0 stopped. [ 250.988985] md/raid:md0: device sda1 operational as raid disk 0 [ 250.988985] md/raid:md0: device sdd1 operational as raid disk 3 [ 250.988986] md/raid:md0: device sdc1 operational as raid disk 2 [ 250.988986] md/raid:md0: device sdb1 operational as raid disk 1 [ 250.989232] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2 [ 250.994896] md/raid:md0: Cannot get enough stripes due to memory pressure. Recovery failed. [ 250.995259] kmem_cache_destroy raid5-md0: Slab cache still has objects [ 250.995260] CPU: 14 PID: 4156 Comm: mdadm Not tainted 4.14.65-gentoo #1 [ 250.995261] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X470 Taichi Ultimate, BIOS P1.20 04/17/2018 [ 250.995261] Call Trace: [ 250.995266] dump_stack+0x5d/0x79 [ 250.995269] kmem_cache_destroy+0x4d/0xfc [ 250.995272] free_conf+0x87/0x12f [raid456] [ 250.995274] raid5_run+0x765/0x831 [raid456] [ 250.995276] md_run+0x5c1/0x9a5 [ 250.995278] ? __wake_up_common_lock+0x76/0xa6 [ 250.995279] do_md_run+0xa/0x94 [ 250.995280] md_ioctl+0x13e5/0x13fc [ 250.995282] ? __switch_to_asm+0x24/0x60 [ 250.995282] ? __switch_to_asm+0x30/0x60 [ 250.995283] ? __switch_to_asm+0x24/0x60 [ 250.995284] ? __switch_to_asm+0x30/0x60 [ 250.995284] ? __switch_to_asm+0x24/0x60 [ 250.995285] ? __switch_to_asm+0x30/0x60 [ 250.995287] blkdev_ioctl+0x7f4/0x80a [ 250.995288] ? __switch_to_asm+0x30/0x60 [ 250.995289] block_ioctl+0x37/0x3a [ 250.995291] vfs_ioctl+0x19/0x26 [ 250.995292] do_vfs_ioctl+0x508/0x51b [ 250.995293] ? __schedule+0x664/0x6b6 [ 250.995294] SyS_ioctl+0x4d/0x6f [ 250.995296] do_syscall_64+0x6d/0xfe [ 250.995297] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 250.995298] RIP: 0033:0x7f770a5fec97 [ 250.995299] RSP: 002b:00007fff52a72ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 250.995300] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f770a5fec97 [ 250.995300] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ 250.995301] RBP: 00007fff52a73020 R08: 0000559aaa49a4c0 R09: 0000000000000004 [ 250.995301] R10: 0000000000000001 R11: 0000000000000246 R12: 000000000000000b [ 250.995302] R13: 0000559aaa49af00 R14: 00007fff52a73770 R15: 0000000000000004 [ 250.995326] md/raid:md0: failed to run raid set. [ 250.995327] md: pers->run() failed ... Of course, the problem has nothing to do with available memory, I have 32GiB. I made many attempts using various boot disks (Gentoo admin cd, Debian live dvd, etc.). Having read that the regression began somewhere after 4.11 I tried a debian livedvd using 4.9.110 and came up with different and scarier errors: Aug 24 06:15:41 localhost kernel: [ 304.346350] md: bind<sda1> Aug 24 06:16:53 localhost kernel: [ 376.687498] md: md0 stopped. Aug 24 06:16:53 localhost kernel: [ 376.687506] md: unbind<sda1> Aug 24 06:16:53 localhost kernel: [ 376.704800] md: export_rdev(sda1) Aug 24 06:17:05 localhost kernel: [ 388.580468] md: md0 stopped. Aug 24 06:17:05 localhost kernel: [ 388.581973] md: bind<sdb1> Aug 24 06:17:05 localhost kernel: [ 388.582254] md: bind<sdc1> Aug 24 06:17:05 localhost kernel: [ 388.582539] md: bind<sdd1> Aug 24 06:17:05 localhost kernel: [ 388.582613] md: bind<nvme0n1> Aug 24 06:17:05 localhost kernel: [ 388.582858] md: bind<sda1> Aug 24 06:17:05 localhost kernel: [ 388.669564] raid6: sse2x1 gen() 6270 MB/s Aug 24 06:17:05 localhost kernel: [ 388.737557] raid6: sse2x1 xor() 5958 MB/s Aug 24 06:17:05 localhost kernel: [ 388.805570] raid6: sse2x2 gen() 13217 MB/s Aug 24 06:17:05 localhost kernel: [ 388.873573] raid6: sse2x2 xor() 9201 MB/s Aug 24 06:17:05 localhost kernel: [ 388.941581] raid6: sse2x4 gen() 13708 MB/s Aug 24 06:17:06 localhost kernel: [ 389.009582] raid6: sse2x4 xor() 8690 MB/s Aug 24 06:17:06 localhost kernel: [ 389.077587] raid6: avx2x1 gen() 15528 MB/s Aug 24 06:17:06 localhost kernel: [ 389.145602] raid6: avx2x2 gen() 20335 MB/s Aug 24 06:17:06 localhost kernel: [ 389.213597] raid6: avx2x4 gen() 21409 MB/s Aug 24 06:17:06 localhost kernel: [ 389.213600] raid6: using algorithm avx2x4 gen() 21409 MB/s Aug 24 06:17:06 localhost kernel: [ 389.213601] raid6: using avx2x2 recovery algorithm Aug 24 06:17:06 localhost kernel: [ 389.217427] async_tx: api initialized (async) Aug 24 06:17:06 localhost kernel: [ 389.224559] xor: automatically using best checksumming function avx Aug 24 06:17:06 localhost kernel: [ 389.244159] md: raid6 personality registered for level 6 Aug 24 06:17:06 localhost kernel: [ 389.244160] md: raid5 personality registered for level 5 Aug 24 06:17:06 localhost kernel: [ 389.244161] md: raid4 personality registered for level 4 Aug 24 06:17:06 localhost kernel: [ 389.244534] md/raid:md0: device sda1 operational as raid disk 0 Aug 24 06:17:06 localhost kernel: [ 389.244538] md/raid:md0: device sdd1 operational as raid disk 3 Aug 24 06:17:06 localhost kernel: [ 389.244540] md/raid:md0: device sdc1 operational as raid disk 2 Aug 24 06:17:06 localhost kernel: [ 389.244542] md/raid:md0: device sdb1 operational as raid disk 1 Aug 24 06:17:06 localhost kernel: [ 389.244883] md/raid:md0: allocated 4374kB Aug 24 06:17:06 localhost kernel: [ 389.244915] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2 Aug 24 06:17:06 localhost kernel: [ 389.244916] RAID conf printout: Aug 24 06:17:06 localhost kernel: [ 389.244916] --- level:5 rd:4 wd:4 Aug 24 06:17:06 localhost kernel: [ 389.244917] disk 0, o:1, dev:sda1 Aug 24 06:17:06 localhost kernel: [ 389.244918] disk 1, o:1, dev:sdb1 Aug 24 06:17:06 localhost kernel: [ 389.244919] disk 2, o:1, dev:sdc1 Aug 24 06:17:06 localhost kernel: [ 389.244920] disk 3, o:1, dev:sdd1 Aug 24 06:17:06 localhost kernel: [ 389.244928] md/raid:md0: using device nvme0n1 as journal Aug 24 06:17:06 localhost kernel: [ 389.247994] attempt to access beyond end of device Aug 24 06:17:06 localhost kernel: [ 389.247996] nvme0n1: rw=8, want=116880821, limit=57149440 Aug 24 06:17:06 localhost kernel: [ 389.247996] attempt to access beyond end of device Aug 24 06:17:06 localhost kernel: [ 389.247997] nvme0n1: rw=8, want=59764149, limit=57149440 Aug 24 06:17:06 localhost kernel: [ 389.248264] attempt to access beyond end of device Aug 24 06:17:06 localhost kernel: [ 389.248265] nvme0n1: rw=8, want=248587438, limit=57149440 Aug 24 06:17:06 localhost kernel: [ 389.248266] attempt to access beyond end of device Aug 24 06:17:06 localhost kernel: [ 389.248267] nvme0n1: rw=8, want=4474931993, limit=57149440 Aug 24 06:17:06 localhost kernel: [ 389.248268] attempt to access beyond end of device Aug 24 06:17:06 localhost kernel: [ 389.248269] nvme0n1: rw=8, want=4417815321, limit=57149440 Aug 24 06:17:06 localhost kernel: [ 389.248269] attempt to access beyond end of device Aug 24 06:17:06 localhost kernel: [ 389.248270] nvme0n1: rw=8, want=4360698649, limit=57149440 Aug 24 06:17:06 localhost kernel: [ 389.248271] attempt to access beyond end of device ... repeats 2371 times with different offsets. I should note that for some reason the "32G" Optane only has 29.260 G bytes (27.3 GiB) I overlooked that Documentation/md/raid5-cache.txt states "write-back mode (supported since 4.10)", so I will *guess* that these errors are due to attempting to work with a write-back journal without support for it. Is there no versioning or other mechanism in the metadata for the journal to indicate it has write-back data? This alone seems like a critical flaw because it should have refused to use the journal stating that it used features not available. (BTW, I didn't try booting a 4.10 kernel and attempting recovery.) So I'm guessing there are two distinct bugs, 1.) the journal recovery being broken, and 2.) a missing or improperly used versioning mechanism to prevent older kernels from trying to use newer journal data formats . Furthermore, 3.) this feature is not marked as experimental and is anything but working. I propose that at AT A VERY MINIMUM, write-back functionality be gated by a Kconfig option that informs the user that it is "(EXPERIMENTAL)". If there are similar bugs affecting raid5-cache in write-through mode then I propose they both be gated in this way so that raid5-cache support is not even built unless explicitly enabled in .config, probably as a sub-option of MD_RAID456. I greatly appreciate Facebook contributing to FOSS, but we also don't want to set up traps for people to use features that will lead to heartache the first time their system crashes for some reason. As for my system, I have just assembled the array read-only without the journal and copied the data elsewhere. I'm probably going to reformat the md device without an external journal. Regards, Daniel PS: As a side-note, it would be really nice if the stripe journal could also serve as a cache of non-dirty stripes, tracking last use (similar to the page cache), keeping a minimum of free entries for new writes and discarding the least-used stripes as needed. I know that would be involved, but it's just a thought. :)