On Fri, Feb 08, 2019 at 12:49:24PM -0300, Ricardo J. Barberis wrote: > El Viernes 08/02/2019 a las 10:17, Brian Foster escribió: > > On Thu, Feb 07, 2019 at 01:09:38PM -0300, Ricardo J. Barberis wrote: > > > Hello list! > > > > > > I'm having a metadata corruption on an XFS filesystem, I googled the error but > > > didn't find anything about it. > > > > > > Background: > > > > > > One CentOS 7.6 box with 2 SSD disks and 3 SATA disks. > > > Those disks are synchorized via DRBD with 5 identical disks on another > > > identical box (for HA). > > > The SSDs form an LVM group with one VG and one LV. > > > This LV is then formatted with XFS and mounted with quotas enabled. > > > The SATA disks form another LVM group with one VG and one LV, also formatted > > > with XFS and mounted quotas enabled. > > > > > > Each pair of servers has keepalived to make sure only one of them puts the > > > DRBD resources as primary and can mount the LVs. > > > > > > Relevant extract from lsblk: > > > sdb 8:16 0 931,5G 0 disk > > > └─sdb1 8:17 0 931,5G 0 part > > > └─drbd2 147:2 0 931,5G 0 disk > > > └─VG2-home 253:4 0 1,8T 0 lvm /home > > > sdc 8:32 0 894,3G 0 disk > > > └─sdc1 8:33 0 894,3G 0 part > > > └─drbd3 147:3 0 894,2G 0 disk > > > └─VG2-home 253:4 0 1,8T 0 lvm /home > > > sdd 8:48 0 931,5G 0 disk > > > └─sdd1 8:49 0 931,5G 0 part > > > └─drbd4 147:4 0 931,5G 0 disk > > > └─VG3-mail 253:0 0 2,7T 0 lvm > > > └─mail 253:5 0 2,7T 0 dm /Mails > > > sde 8:64 0 931,5G 0 disk > > > └─sde1 8:65 0 931,5G 0 part > > > └─drbd5 147:5 0 931,5G 0 disk > > > └─VG3-mail 253:0 0 2,7T 0 lvm > > > └─mail 253:5 0 2,7T 0 dm /Mails > > > sdf 8:80 0 931,5G 0 disk > > > └─sdf1 8:81 0 931,5G 0 part > > > └─drbd6 147:6 0 931,5G 0 disk > > > └─VG3-mail 253:0 0 2,7T 0 lvm > > > └─mail 253:5 0 2,7T 0 dm /Mails > > > > > > > > > We have several pairs of servers with this same configuration, but on this > > > particular pair of boxes we're getting metadata corruption only on the SSD LV > > > and quotas don't get accounted for, dmesg shows these errors on the primary box: > > > > > > > I assume there are different workloads between the two volumes as well, > > based on the naming above at least, and that dm-4 is the VG2-home volume > > above..? > > Yes, that's correct. > > > Either way, can you provide the xfs_info for the associated filesystem? > > Sure thing: > > [root@c142a ~] # xfs_info /home > meta-data=/dev/mapper/VG2-home isize=512 agcount=32, agsize=14651136 blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=1 finobt=0 spinodes=0 > data = bsize=4096 blocks=468830208, imaxpct=5 > = sunit=256 swidth=512 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > log =internal bsize=4096 blocks=228921, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > [root@c142a ~] # xfs_info /Mails > meta-data=/dev/mapper/mail isize=512 agcount=32, agsize=22892288 blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=1 finobt=0 spinodes=0 > data = bsize=4096 blocks=732546048, imaxpct=5 > = sunit=256 swidth=768 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > log =internal bsize=4096 blocks=357688, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > > > > [root@c142a ~] # dmesg -T | grep XFS > > > [mié feb 6 18:43:03 2019] SGI XFS with ACLs, security attributes, no debug enabled > > > [mié feb 6 18:43:03 2019] XFS (dm-4): Mounting V5 Filesystem > > > [mié feb 6 18:43:03 2019] XFS (dm-4): Starting recovery (logdev: internal) > > > > What happened to require log recovery in the first place? > > At that time c142b was acting as primary and crashed, so c142a took over. > > We were having some issues with these two servers, power loss in a couple of > cases, and c142b crashed a few times also, we had to change power supplies and > RAM. > > > > [mié feb 6 18:43:04 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [mié feb 6 18:43:04 2019] XFS (dm-4): Unmount and run xfs_repair > > > [mié feb 6 18:43:04 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [mié feb 6 18:43:04 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > [mié feb 6 18:43:04 2019] XFS (dm-4): log mount/recovery failed: error -117 > > > [mié feb 6 18:43:04 2019] XFS (dm-4): log mount failed > > > > So log recovery and the mount failed. Is this where you ran > > xfs_repair? > > Yes, I was informed that c142b crashed and c142a didn't mount /home, xfs_repair > complained about the log and had to use -L to "fix" it :( > > > > [mié feb 6 18:48:52 2019] XFS (dm-5): Mounting V5 Filesystem > > > [mié feb 6 18:48:52 2019] XFS (dm-5): Ending clean mount > > > [mié feb 6 18:48:59 2019] XFS (dm-5): Unmounting Filesystem > > > [mié feb 6 18:57:25 2019] XFS (dm-4): Mounting V5 Filesystem > > > [mié feb 6 18:57:25 2019] XFS (dm-4): Ending clean mount > > > [mié feb 6 18:57:25 2019] XFS (dm-4): Quotacheck needed: Please wait. > > > > Then the mount succeeds (repair presumably zapped the log), a quotacheck > > was required and before that even completes we run into the same issue. > > Yes, it mounted fine but doing a "xfs_quota -x -c 'report /home -b'" triggered the > error again. > > > > [mié feb 6 18:57:26 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [mié feb 6 18:57:26 2019] XFS (dm-4): Unmount and run xfs_repair > > > [mié feb 6 18:57:26 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [mié feb 6 18:57:26 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [mié feb 6 18:57:26 2019] XFS (dm-4): Unmount and run xfs_repair > > > [mié feb 6 18:57:26 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [mié feb 6 18:57:26 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > [mié feb 6 18:57:52 2019] XFS (dm-4): Quotacheck: Done. > > > [mié feb 6 18:58:13 2019] XFS (dm-4): Unmounting Filesystem > > > [mié feb 6 18:58:15 2019] XFS (dm-4): Mounting V5 Filesystem > > > [mié feb 6 18:58:15 2019] XFS (dm-4): Ending clean mount > > > [mié feb 6 18:58:27 2019] XFS (dm-4): Unmounting Filesystem > > > [mié feb 6 19:01:12 2019] XFS (dm-5): Mounting V5 Filesystem > > > [mié feb 6 19:01:12 2019] XFS (dm-5): Ending clean mount > > > [mié feb 6 19:01:12 2019] XFS (dm-4): Mounting V5 Filesystem > > > [mié feb 6 19:01:12 2019] XFS (dm-4): Ending clean mount > > > [mié feb 6 19:03:08 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [mié feb 6 19:03:08 2019] XFS (dm-4): Unmount and run xfs_repair > > > [mié feb 6 19:03:08 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [mié feb 6 19:03:08 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > [mié feb 6 19:03:08 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [mié feb 6 19:03:08 2019] XFS (dm-4): Unmount and run xfs_repair > > > [mié feb 6 19:03:08 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [mié feb 6 19:03:08 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > > > > > > > We tried xfs_repair but it doesn't seem to fix it. > > > > > > > Does xfs_repair find and fix anything? Please show the associated repair > > output. > > Unfotunately I didn't save xfs_repair output, but I don't believe it fixed > anything other than the log that first time. Eric Sandeen amended xfs_repair in xfsprogs 4.17 to detect and zap corrupt quota blocks. I don't know what version of xfsprogs centos 7.6 ships with, but you might try running something newer? (Run it with -n first to make sure repair identifies the corrupt dquot blocks, as is customary...) --D > > > We then promoted the secondary and tried xfs_repair there, fearing some memory > > > issues on the primary, but the result is the same: > > > > > > > I'm not terribly familiar with drbd. I assume this means the primary was > > offlined and the secondary onlined. IOW, these two filesystems are not > > ever simultaneously active, correct? > > That's correct (drbd has an option to disable that behaviour if you want to use it > with a clustered filesystem but it's off by default and we never use it). > > > Brian > > > I see that below I pasted an older dmesg log I had, sorry for that. > > > > [root@c142b ~] # dmesg -T | grep XFS > > > [jue ene 31 19:14:12 2019] SGI XFS with ACLs, security attributes, no debug enabled > > > [jue ene 31 19:14:12 2019] XFS (dm-4): Mounting V5 Filesystem > > > [jue ene 31 19:14:12 2019] XFS (dm-4): Ending clean mount > > > [jue ene 31 19:22:20 2019] XFS (dm-4): Unmounting Filesystem > > > [jue ene 31 19:23:24 2019] XFS (dm-5): Mounting V5 Filesystem > > > [jue ene 31 19:23:24 2019] XFS (dm-5): Ending clean mount > > > [jue ene 31 19:23:24 2019] XFS (dm-4): Mounting V5 Filesystem > > > [jue ene 31 19:23:24 2019] XFS (dm-4): Ending clean mount > > > [jue ene 31 19:25:21 2019] XFS (dm-4): Unmounting Filesystem > > > [jue ene 31 19:26:14 2019] XFS (dm-4): Mounting V5 Filesystem > > > [jue ene 31 19:26:14 2019] XFS (dm-4): Ending clean mount > > > [jue ene 31 19:26:14 2019] XFS (dm-4): Quotacheck needed: Please wait. > > > [jue ene 31 19:26:14 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [jue ene 31 19:26:14 2019] XFS (dm-4): Unmount and run xfs_repair > > > [jue ene 31 19:26:14 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [jue ene 31 19:26:14 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [jue ene 31 19:26:14 2019] XFS (dm-4): Unmount and run xfs_repair > > > [jue ene 31 19:26:14 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [jue ene 31 19:26:14 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > [jue ene 31 19:26:40 2019] XFS (dm-4): Quotacheck: Done. > > > [jue ene 31 19:34:31 2019] XFS (dm-5): Unmounting Filesystem > > > [jue ene 31 19:35:13 2019] XFS (dm-4): Unmounting Filesystem > > > [jue ene 31 19:46:33 2019] XFS (dm-5): Mounting V5 Filesystem > > > [jue ene 31 19:46:34 2019] XFS (dm-5): Ending clean mount > > > [jue ene 31 19:46:34 2019] XFS (dm-4): Mounting V5 Filesystem > > > [jue ene 31 19:46:34 2019] XFS (dm-4): Ending clean mount > > > [jue ene 31 19:47:18 2019] XFS (dm-4): Unmounting Filesystem > > > [jue ene 31 19:47:21 2019] XFS (dm-4): Mounting V5 Filesystem > > > [jue ene 31 19:47:21 2019] XFS (dm-4): Ending clean mount > > > [jue ene 31 19:47:29 2019] XFS (dm-4): Unmounting Filesystem > > > [jue ene 31 19:50:28 2019] XFS (dm-4): Mounting V5 Filesystem > > > [jue ene 31 19:50:28 2019] XFS (dm-4): Ending clean mount > > > [jue ene 31 19:50:28 2019] XFS (dm-4): Quotacheck needed: Please wait. > > > [jue ene 31 19:50:28 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [jue ene 31 19:50:28 2019] XFS (dm-4): Unmount and run xfs_repair > > > [jue ene 31 19:50:28 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [jue ene 31 19:50:28 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [jue ene 31 19:50:28 2019] XFS (dm-4): Unmount and run xfs_repair > > > [jue ene 31 19:50:28 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [jue ene 31 19:50:28 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > [jue ene 31 19:50:54 2019] XFS (dm-4): Quotacheck: Done. > > > > > > > > > This is a more complete extract of dmesg, where I noticed some context lines > > > that might be useful: > > > > > > [Thu Feb 7 12:06:45 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [Thu Feb 7 12:06:45 2019] XFS (dm-4): Unmount and run xfs_repair > > > [Thu Feb 7 12:06:45 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [Thu Feb 7 12:06:45 2019] ffffa0002708a000: 44 51 01 01 00 00 d7 82 00 00 00 00 00 00 00 00 DQ.............. > > > [Thu Feb 7 12:06:45 2019] ffffa0002708a010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 12:06:45 2019] ffffa0002708a020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 12:06:45 2019] ffffa0002708a030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 12:06:45 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > [Thu Feb 7 12:06:45 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [Thu Feb 7 12:06:45 2019] XFS (dm-4): Unmount and run xfs_repair > > > [Thu Feb 7 12:06:45 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [Thu Feb 7 12:06:45 2019] ffffa003bdb3b000: 44 51 01 01 00 00 d7 82 00 00 00 00 00 00 00 00 DQ.............. > > > [Thu Feb 7 12:06:45 2019] ffffa003bdb3b010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 12:06:45 2019] ffffa003bdb3b020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 12:06:45 2019] ffffa003bdb3b030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 12:06:45 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > [Thu Feb 7 13:03:43 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [Thu Feb 7 13:03:43 2019] XFS (dm-4): Unmount and run xfs_repair > > > [Thu Feb 7 13:03:43 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [Thu Feb 7 13:03:43 2019] ffffa001427e8000: 44 51 01 01 00 00 d7 82 00 00 00 00 00 00 00 00 DQ.............. > > > [Thu Feb 7 13:03:43 2019] ffffa001427e8010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 13:03:43 2019] ffffa001427e8020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 13:03:43 2019] ffffa001427e8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 13:03:43 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > [Thu Feb 7 13:03:43 2019] XFS (dm-4): Metadata CRC error detected at xfs_dquot_buf_read_verify+0x4f/0x90 [xfs], xfs_dquot block 0x4170 > > > [Thu Feb 7 13:03:43 2019] XFS (dm-4): Unmount and run xfs_repair > > > [Thu Feb 7 13:03:43 2019] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > > [Thu Feb 7 13:03:43 2019] ffffa004a3ef1000: 44 51 01 01 00 00 d7 82 00 00 00 00 00 00 00 00 DQ.............. > > > [Thu Feb 7 13:03:43 2019] ffffa004a3ef1010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 13:03:43 2019] ffffa004a3ef1020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 13:03:43 2019] ffffa004a3ef1030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > > [Thu Feb 7 13:03:43 2019] XFS (dm-4): metadata I/O error: block 0x4170 ("xfs_trans_read_buf_map") error 74 numblks 8 > > > > > > > > > Is there anything else I can try? > > > Any more info needed? > > > Should I open a bug report instead? > > > > > > I can compile a newr version of xfsprogs but I don't know if it'll help. > > > > > > > > > Thanks, > -- > Ricardo J. Barberis > Usuario Linux Nº 250625: http://counter.li.org/ > Usuario LFS Nº 5121: http://www.linuxfromscratch.org/ > Senior SysAdmin / IT Architect - www.DonWeb.com