(Re-sending without the attachment, in case that's why my previous message appears to have been silently rejected.) I can't find a list for non-devel related post like this, so sorry if this is going to the wrong place. I have an older JBOD attached to a Microsemi 3164-8e RAID card (with battery backup) using CentOS 7. It has been up and running for a few weeks now, but backups have failed due to operator errors. While I was working yesterday evening doing some moderate I/O, the first backups (bacula) started, and 3 of the disks in a RAID6 set (7 data + 2 parity) suddenly went offline at the same time. I rebooted, and the RAID card’s command line utility (arcconf) showed the drives were back online – no longer marked “failed”. So I gave the command: $ arcconf setstate 1 logicaldrive 3 optimal To bring the array back online. The next step is to mount and umount ito replay the journal, then run xfs_repair. I gave the mount command and waited ~10 hours, but it’s still hung. I can’t figure out why – any help would be greatly appreciated. I realize now I should have cycled the power on the JBOD only (not rebooted the server) to bring the drives back online. Details: I created the RAID arrays (7 data + 2 parity + 1 hot spare) like this - taken from my notes: <notes> # create the RAID array 2 with physical labels 3.x (4TB disks): arcconf create 1 logicaldrive stripesize 256 MAX 6 0 5 0 13 0 14 0 34 0 35 0 43 0 44 0 52 0 53 noprompt # create the RAID array 3 with physical labels 2.x (4TB disks): arcconf create 1 logicaldrive stripesize 256 MAX 6 0 7 0 8 0 16 0 17 0 25 0 26 0 46 0 47 0 56 noprompt # create the RAID array 4 with physical labels 1.x (4TB disks): arcconf create 1 logicaldrive stripesize 256 MAX 6 0 10 0 11 0 19 0 20 0 28 0 29 0 37 0 38 0 59 noprompt # Combine the three 4TB disks arrays (called "logicaldrive" in arcconf). # The three 4TB RAID arrays from arcconf appear as sde, sdf, and sdg. pvcreate /dev/sde /dev/sdf /dev/sdg vgcreate volgrp4TB /dev/sde /dev/sdf /dev/sdg lvcreate -l 100%VG -n lvol4TB volgrp4TB # creates /dev/volgrp4TB/lvol4TB # Store the journal on a RAID0 M2. NVMe device: parted /dev/nvme0n1 mkpart primary 0% 25% parted /dev/nvme0n1 mkpart primary 25% 50% parted /dev/nvme0n1 mkpart primary 50% 75% parted /dev/nvme0n1 mkpart primary 75% 100% parted /dev/nvme1n1 mkpart primary 0% 25% parted /dev/nvme1n1 mkpart primary 25% 50% parted /dev/nvme1n1 mkpart primary 50% 75% parted /dev/nvme1n1 mkpart primary 75% 100% parted /dev/nvme0n1 set 1 raid on parted /dev/nvme0n1 set 2 raid on parted /dev/nvme0n1 set 3 raid on parted /dev/nvme0n1 set 4 raid on parted /dev/nvme1n1 set 1 raid on parted /dev/nvme1n1 set 2 raid on parted /dev/nvme1n1 set 3 raid on parted /dev/nvme1n1 set 4 raid on mdadm --create /dev/md/nvme1 --level=mirror --raid-devices=2 /dev/nvme0n1p1 /dev/nvme1n1p1 mdadm --create /dev/md/nvme2 --level=mirror --raid-devices=2 /dev/nvme0n1p2 /dev/nvme1n1p2 mdadm --create /dev/md/nvme3 --level=mirror --raid-devices=2 /dev/nvme0n1p3 /dev/nvme1n1p3 mdadm --create /dev/md/nvme4 --level=mirror --raid-devices=2 /dev/nvme0n1p4 /dev/nvme1n1p4 mdadm --examine --scan --config=/etc/mdadm.conf | tail -4 >> /etc/mdadm.conf mkfs.xfs -l logdev=/dev/md/nvme2,size=500000b -d su=256k,sw=7 /dev/volgrp4TB/lvol4TB mkdir -p /export/lvol4TB echo "/dev/volgrp4TB/lvol4TB /export/lvol4TB auto inode64,logdev=/dev/md/nvme2 1 2" >> /etc/fstab mount -a </notes> After a boot, I can still see the PVs and LVs: $ pvs; lvs PV VG Fmt Attr PSize PFree /dev/sdc volgrp6TB lvm2 a-- <38.21t 0 /dev/sdd volgrp6TB lvm2 a-- <38.21t 0 /dev/sde volgrp4TB lvm2 a-- 25.47t 0 /dev/sdf volgrp4TB lvm2 a-- 25.47t 0 /dev/sdg volgrp4TB lvm2 a-- 25.47t 0 LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol4TB volgrp4TB -wi-ao---- 76.41t lvol6TB volgrp6TB -wi-ao---- 76.41t When I try to mount the problematic filesystem using % mount -o inode64,logdev=/dev/md/nvme2 -t xfs /dev/volgrp4TB/lvol4TB /export/lvol4TB or by using mount -a with the following in /etc/fstab: /dev/volgrp4TB/lvol4TB /export/lvol4TB xfs inode64,logdev=/dev/md/nvme2 1 2 I see this in dmesg: [ 417.033013] XFS (dm-0): Mounting V5 Filesystem [ 417.086868] XFS (dm-0): Starting recovery (logdev: /dev/md/nvme2) [ 417.094291] XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x79/0x100 [xfs], xfs_inode block 0x108605a000 [ 417.094342] XFS (dm-0): Unmount and run xfs_repair [ 417.094360] XFS (dm-0): First 64 bytes of corrupted metadata buffer: [ 417.094383] ffffbd6f5ba68000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ [ 417.094412] ffffbd6f5ba68010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ [ 417.094439] ffffbd6f5ba68020: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ [ 417.094467] ffffbd6f5ba68030: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ [ 417.094512] XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x79/0x100 [xfs], xfs_inode block 0x108605a000 (repeat like that many times) [ 417.209070] XFS (dm-0): Unmount and run xfs_repair [ 417.209662] XFS (dm-0): First 64 bytes of corrupted metadata buffer: [ 417.210263] ffffbd6f5ba68000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ [ 417.210893] ffffbd6f5ba68010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ [ 417.211518] ffffbd6f5ba68020: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ [ 417.212121] ffffbd6f5ba68030: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ [ 417.212709] XFS (dm-0): metadata I/O error: block 0x108605a000 ("xlog_recover_do..(read#2)") error 117 numblks 32 [ 417.213415] XFS (dm-0): Internal error XFS_WANT_CORRUPTED_RETURN at line 143 of file fs/xfs/libxfs/xfs_dir2_data.c. Caller xfs_dir3_data_verify+0x9a/0xb0 [xfs] [ 417.214652] CPU: 19 PID: 11323 Comm: mount Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1 [ 417.214653] Hardware name: Supermicro SYS-6028R-TR/X10DRi, BIOS 3.1b 05/16/2019 [ 417.214654] Call Trace: [ 417.214663] [<ffffffffa6779262>] dump_stack+0x19/0x1b [ 417.214680] [<ffffffffc06af26b>] xfs_error_report+0x3b/0x40 [xfs] [ 417.214690] [<ffffffffc069192a>] ? xfs_dir3_data_verify+0x9a/0xb0 [xfs] [ 417.214700] [<ffffffffc0691779>] __xfs_dir3_data_check+0x4b9/0x5d0 [xfs] [ 417.214710] [<ffffffffc069192a>] xfs_dir3_data_verify+0x9a/0xb0 [xfs] [ 417.214719] [<ffffffffc0691971>] xfs_dir3_data_write_verify+0x31/0xc0 [xfs] [ 417.214730] [<ffffffffc06acc9e>] ? xfs_buf_delwri_submit_buffers+0x12e/0x240 [xfs] [ 417.214740] [<ffffffffc06aada7>] _xfs_buf_ioapply+0x97/0x460 [xfs] [ 417.214744] [<ffffffffa60da0b0>] ? wake_up_state+0x20/0x20 [ 417.214754] [<ffffffffc06acc9e>] ? xfs_buf_delwri_submit_buffers+0x12e/0x240 [xfs] [ 417.214764] [<ffffffffc06ac8c1>] __xfs_buf_submit+0x61/0x210 [xfs] [ 417.214773] [<ffffffffc06acc9e>] xfs_buf_delwri_submit_buffers+0x12e/0x240 [xfs] [ 417.214783] [<ffffffffc06ad97c>] ? xfs_buf_delwri_submit+0x3c/0xf0 [xfs] [ 417.214793] [<ffffffffc06ad97c>] xfs_buf_delwri_submit+0x3c/0xf0 [xfs] [ 417.214806] [<ffffffffc06daa9e>] xlog_do_recovery_pass+0x3ae/0x6e0 [xfs] [ 417.214819] [<ffffffffc06dae59>] xlog_do_log_recovery+0x89/0xd0 [xfs] [ 417.214830] [<ffffffffc06daed1>] xlog_do_recover+0x31/0x180 [xfs] [ 417.214842] [<ffffffffc06dbfef>] xlog_recover+0xbf/0x190 [xfs] [ 417.214854] [<ffffffffc06ce58f>] xfs_log_mount+0xff/0x310 [xfs] [ 417.214866] [<ffffffffc06c51b0>] xfs_mountfs+0x520/0x8e0 [xfs] [ 417.214877] [<ffffffffc06c82a0>] xfs_fs_fill_super+0x410/0x550 [xfs] [ 417.214881] [<ffffffffa624c893>] mount_bdev+0x1b3/0x1f0 [ 417.214892] [<ffffffffc06c7e90>] ? xfs_test_remount_options.isra.12+0x70/0x70 [xfs] [ 417.214903] [<ffffffffc06c6aa5>] xfs_fs_mount+0x15/0x20 [xfs] [ 417.214905] [<ffffffffa624d1fe>] mount_fs+0x3e/0x1b0 [ 417.214908] [<ffffffffa626b377>] vfs_kern_mount+0x67/0x110 [ 417.214910] [<ffffffffa626dacf>] do_mount+0x1ef/0xce0 [ 417.214913] [<ffffffffa624521a>] ? __check_object_size+0x1ca/0x250 [ 417.214916] [<ffffffffa622368c>] ? kmem_cache_alloc_trace+0x3c/0x200 [ 417.214918] [<ffffffffa626e903>] SyS_mount+0x83/0xd0 [ 417.214921] [<ffffffffa678bede>] system_call_fastpath+0x25/0x2a [ 417.214932] XFS (dm-0): Metadata corruption detected at xfs_dir3_data_write_verify+0xad/0xc0 [xfs], xfs_dir3_data block 0x19060071f8 [ 417.216244] XFS (dm-0): Unmount and run xfs_repair [ 417.216901] XFS (dm-0): First 64 bytes of corrupted metadata buffer: [ 417.217550] ffff98b8b5edc000: 58 44 44 33 1f 58 d3 06 00 00 00 19 06 00 71 f8 XDD3.X........q. [ 417.218208] ffff98b8b5edc010: 00 00 00 01 00 11 d1 0e 77 df d5 2b c0 8a 44 38 ........w..+..D8 [ 417.218864] ffff98b8b5edc020: b6 6b 44 d5 a5 17 96 e6 00 00 00 19 06 00 d6 85 .kD............. [ 417.219527] ffff98b8b5edc030: 07 30 08 d0 00 40 00 20 00 00 00 00 00 00 20 00 .0...@. ...... . [ 417.220203] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1393 of file fs/xfs/xfs_buf.c. Return address = 0xffffffffc06aadd7 [ 417.220205] XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem [ 417.220897] XFS (dm-0): Please umount the filesystem and rectify the problem(s) I would try to umount, but the mount command has hung. Ctrl-C does not kill it. I can see it in `ps aux`, but it’s not using any CPU. `kill -9 [pid]` does not kill the process. At this point I can only reboot to kill off the hung mount process. If I try to run xfs_repair before trying to mount it, it says: $ xfs_repair -l /dev/md/nvme2 /dev/volgrp4TB/lvol4TB Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using external log on /dev/md/nvme2 - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. If I try to run xfs_repair after trying to mount it, it says: $ xfs_repair -l /dev/md/nvme2 /dev/volgrp4TB/lvol4TB xfs_repair: cannot open /dev/volgrp4TB/lvol4TB: Device or resource busy I can see in arcconf that the array is rebuilding. I did the “echo w > /proc/sysrq-trigger” and captured the output of dmesg, attached to this email. Here’s most of the details that are requested at https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F $ uname -a Linux hostname1.localdomain 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux $ xfs_repair -V xfs_repair version 4.5.0 $ cat /proc/cpuinfo | grep processor | wc -l 56 $ cat /proc/meminfo MemTotal: 263856296 kB MemFree: 261564216 kB MemAvailable: 261016076 kB Buffers: 4444 kB Cached: 183392 kB SwapCached: 0 kB Active: 109128 kB Inactive: 147332 kB Active(anon): 69304 kB Inactive(anon): 9992 kB Active(file): 39824 kB Inactive(file): 137340 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4194300 kB SwapFree: 4194300 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 68780 kB Mapped: 29312 kB Shmem: 10568 kB Slab: 146228 kB SReclaimable: 44360 kB SUnreclaim: 101868 kB KernelStack: 10176 kB PageTables: 5712 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 136122448 kB Committed_AS: 370852 kB VmallocTotal: 34359738367 kB VmallocUsed: 712828 kB VmallocChunk: 34224732156 kB HardwareCorrupted: 0 kB AnonHugePages: 14336 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 288568 kB DirectMap2M: 6940672 kB DirectMap1G: 263192576 kB $ cat /proc/partitions major minor #blocks name 8 16 1953514584 sdb 8 17 1895636992 sdb1 8 18 52428800 sdb2 8 0 1953514584 sda 8 1 204800 sda1 8 2 1047552 sda2 8 3 1895636992 sda3 8 4 52428800 sda4 8 5 4194304 sda5 9 127 52395008 md127 8 32 41023425112 sdc 8 48 41023425112 sdd 8 64 27348895576 sde 8 80 27348895576 sdf 8 96 27348895576 sdg 259 1 244198584 nvme0n1 259 4 61048832 nvme0n1p1 259 6 61049856 nvme0n1p2 259 8 61048832 nvme0n1p3 259 9 61049856 nvme0n1p4 259 0 244198584 nvme1n1 259 2 61048832 nvme1n1p1 259 3 61049856 nvme1n1p2 259 5 61048832 nvme1n1p3 259 7 61049856 nvme1n1p4 9 126 1895504896 md126 9 125 61016064 md125 9 124 61015040 md124 9 123 61016064 md123 9 122 61015040 md122 253 0 82046681088 dm-0 253 1 82046844928 dm-1 $ xfs_info /dev/volgrp4TB/ xfs_info: /dev/volgrp4TB/ is not a mounted XFS filesystem Bart 206-550-2606 cell, textable