Hi,
We have a new Linux/XFS deployment (about a month old) and randomly without warning the XFS filesystem will go off-line. We are running Scientific Linux release 5.9 with the latest updates.
# uname -a
Linux node24 2.6.18-348.3.1.el5 #1 SMP Mon Mar 11 15:43:13 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
Scientific Linux release 5.9 (Boron)
Here are the errors we see in /var/log/messages after the initial off-line event:
-- snip --
Apr 2 07:50:28 node24 kernel: xfs_iunlink_remove: xfs_inotobp() returned an error 22 on dm-6. Returning error.
Apr 2 07:50:28 node24 kernel: xfs_inactive: xfs_ifree() returned an error = 22 on dm-6
Apr 2 07:50:28 node24 kernel: xfs_force_shutdown(dm-6,0x1) called from line 1419 of file fs/xfs/xfs_vnodeops.c. Return address = 0xffffffff8855b86b
Apr 2 07:50:28 node24 kernel: Filesystem dm-6: I/O Error Detected. Shutting down filesystem: dm-6
Apr 2 07:50:28 node24 kernel: Please umount the filesystem, and rectify the problem(s)
Apr 2 07:50:52 node24 kernel: Filesystem dm-6: xfs_log_force: error 5 returned.
Apr 2 07:51:52 node24 last message repeated 2 times
-- snip --
Here are the messages after I umount/xfs_repair/mount the filesystem:
-- snip --
Apr 2 10:23:04 node24 kernel: xfs_force_shutdown(dm-6,0x1) called from line 420 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff8855c0fe
Apr 2 10:23:07 node24 kernel: Filesystem dm-6: xfs_log_force: error 5 returned.
Apr 2 10:23:07 node24 last message repeated 4 times
Apr 2 10:24:08 node24 kernel: Filesystem dm-6: Disabling barriers, trial barrier write failed
Apr 2 10:24:08 node24 kernel: XFS mounting filesystem dm-6
Apr 2 10:24:08 node24 kernel: Starting XFS recovery on filesystem: dm-6 (logdev: internal)
Apr 2 10:24:10 node24 kernel: Ending XFS recovery on filesystem: dm-6 (logdev: internal)
Apr 2 10:24:17 node24 multipathd: dm-6: umount map (uevent)
Apr 2 10:58:54 node24 kernel: Filesystem dm-6: Disabling barriers, trial barrier write failed
Apr 2 10:58:54 node24 kernel: XFS mounting filesystem dm-6
-- snip --
We are taking 6 devices from a SAN and using LVM to effectively create a RAID0 block devices which XFS is sitting on. We do not see any multipathd errors.
I created the filesystem using this command.
# mkfs.xfs -f -d su=256k,sw=6,sectsize=4096,unwritten=0 -i attr=2 -l sectsize=4096,lazy-count=1 -r extsize=4096 /dev/mapper/vol_d24-root
Here are the mount options:
# cat /etc/fstab | grep xfs
/dev/mapper/vol_d24-root /archive/d24 xfs defaults,inode64 0 9
# mount | grep xfs
/dev/mapper/vol_d24-root on /archive/d24 type xfs (rw,inode64)
Here is the output of xfs_info:
# xfs_info /dev/mapper/vol_d24-root
meta-data="" isize=256 agcount=88, agsize=268435392 blks
= sectsz=4096 attr=2
data = bsize=4096 blocks=23441774592, imaxpct=25
= sunit=64 swidth=384 blks, unwritten=0
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
After the initial off-line event I:
- umount
- ran xfs_repair (it told me to mount/umount and then re-run xfs_repair)
- mount
- umount
- xfs_repair
Here is the output of xfs_repair:
-- snip --
# xfs_repair /dev/mapper/vol_d24-root
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
2acde2416940: Badness in key lookup (length)
bp=(bno 14657493984, len 16384 bytes) key=(bno 14657493984, len 8192 bytes)
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
2acde2416940: Badness in key lookup (length)
bp=(bno 26065183200, len 16384 bytes) key=(bno 26065183200, len 8192 bytes)
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
2acde2e17940: Badness in key lookup (length)
bp=(bno 43039175488, len 16384 bytes) key=(bno 43039175488, len 8192 bytes)
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- agno = 37
- agno = 38
- agno = 39
- agno = 40
- agno = 41
- agno = 42
- agno = 43
- agno = 44
- agno = 45
- agno = 46
- agno = 47
2acde0613940: Badness in key lookup (length)
bp=(bno 101051527232, len 16384 bytes) key=(bno 101051527232, len 8192 bytes)
2acde0613940: Badness in key lookup (length)
bp=(bno 101081120768, len 16384 bytes) key=(bno 101081120768, len 8192 bytes)
2acde0613940: Badness in key lookup (length)
bp=(bno 102336613216, len 16384 bytes) key=(bno 102336613216, len 8192 bytes)
- agno = 48
- agno = 49
2acde2416940: Badness in key lookup (length)
bp=(bno 107185599392, len 16384 bytes) key=(bno 107185599392, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107606543312, len 16384 bytes) key=(bno 107606543312, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107674994560, len 16384 bytes) key=(bno 107674994560, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107675078656, len 16384 bytes) key=(bno 107675078656, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107675078688, len 16384 bytes) key=(bno 107675078688, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107675078720, len 16384 bytes) key=(bno 107675078720, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107675175008, len 16384 bytes) key=(bno 107675175008, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107704942624, len 16384 bytes) key=(bno 107704942624, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107763211904, len 16384 bytes) key=(bno 107763211904, len 8192 bytes)
- agno = 50
2acde1014940: Badness in key lookup (length)
bp=(bno 109436122656, len 16384 bytes) key=(bno 109436122656, len 8192 bytes)
2acde2e17940: Badness in key lookup (length)
bp=(bno 110466056352, len 16384 bytes) key=(bno 110466056352, len 8192 bytes)
2acde2e17940: Badness in key lookup (length)
bp=(bno 110603835392, len 16384 bytes) key=(bno 110603835392, len 8192 bytes)
- agno = 51
- agno = 52
- agno = 53
- agno = 54
- agno = 55
- agno = 56
- agno = 57
- agno = 58
- agno = 59
- agno = 60
- agno = 61
2acde2416940: Badness in key lookup (length)
bp=(bno 132435472416, len 16384 bytes) key=(bno 132435472416, len 8192 bytes)
- agno = 62
2acde2416940: Badness in key lookup (length)
bp=(bno 135330780000, len 16384 bytes) key=(bno 135330780000, len 8192 bytes)
2acde2416940: Badness in key lookup (length)
bp=(bno 135508074496, len 16384 bytes) key=(bno 135508074496, len 8192 bytes)
2acde2416940: Badness in key lookup (length)
bp=(bno 135675982432, len 16384 bytes) key=(bno 135675982432, len 8192 bytes)
- agno = 63
- agno = 64
- agno = 65
- agno = 66
- agno = 67
- agno = 68
- agno = 69
- agno = 70
- agno = 71
- agno = 72
- agno = 73
- agno = 74
- agno = 75
- agno = 76
- agno = 77
- agno = 78
- agno = 79
- agno = 80
- agno = 81
- agno = 82
- agno = 83
- agno = 84
- agno = 85
- agno = 86
- agno = 87
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 5
- agno = 6
- agno = 7
- agno = 2
- agno = 3
- agno = 8
- agno = 9
- agno = 4
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 19
- agno = 20
- agno = 18
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- agno = 37
- agno = 38
- agno = 39
- agno = 40
- agno = 41
- agno = 42
- agno = 43
- agno = 44
- agno = 45
- agno = 46
- agno = 47
- agno = 48
- agno = 49
- agno = 50
- agno = 51
- agno = 52
- agno = 53
- agno = 54
- agno = 55
- agno = 56
- agno = 57
- agno = 58
- agno = 59
- agno = 60
- agno = 61
- agno = 62
- agno = 63
- agno = 64
- agno = 65
- agno = 66
- agno = 67
- agno = 68
- agno = 69
- agno = 70
- agno = 71
- agno = 72
- agno = 73
- agno = 74
- agno = 75
- agno = 76
- agno = 77
- agno = 78
- agno = 79
- agno = 80
- agno = 81
- agno = 82
- agno = 83
- agno = 84
- agno = 85
- agno = 86
- agno = 87
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 202102936036, moving to lost+found
disconnected inode 215350040250, moving to lost+found
disconnected inode 215350208634, moving to lost+found
disconnected inode 271016406074, moving to lost+found
Phase 7 - verify and correct link counts...
done
-- snip --
Any ideas?
Thanks
We have a new Linux/XFS deployment (about a month old) and randomly without warning the XFS filesystem will go off-line. We are running Scientific Linux release 5.9 with the latest updates.
# uname -a
Linux node24 2.6.18-348.3.1.el5 #1 SMP Mon Mar 11 15:43:13 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
Scientific Linux release 5.9 (Boron)
Here are the errors we see in /var/log/messages after the initial off-line event:
-- snip --
Apr 2 07:50:28 node24 kernel: xfs_iunlink_remove: xfs_inotobp() returned an error 22 on dm-6. Returning error.
Apr 2 07:50:28 node24 kernel: xfs_inactive: xfs_ifree() returned an error = 22 on dm-6
Apr 2 07:50:28 node24 kernel: xfs_force_shutdown(dm-6,0x1) called from line 1419 of file fs/xfs/xfs_vnodeops.c. Return address = 0xffffffff8855b86b
Apr 2 07:50:28 node24 kernel: Filesystem dm-6: I/O Error Detected. Shutting down filesystem: dm-6
Apr 2 07:50:28 node24 kernel: Please umount the filesystem, and rectify the problem(s)
Apr 2 07:50:52 node24 kernel: Filesystem dm-6: xfs_log_force: error 5 returned.
Apr 2 07:51:52 node24 last message repeated 2 times
-- snip --
Here are the messages after I umount/xfs_repair/mount the filesystem:
-- snip --
Apr 2 10:23:04 node24 kernel: xfs_force_shutdown(dm-6,0x1) called from line 420 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff8855c0fe
Apr 2 10:23:07 node24 kernel: Filesystem dm-6: xfs_log_force: error 5 returned.
Apr 2 10:23:07 node24 last message repeated 4 times
Apr 2 10:24:08 node24 kernel: Filesystem dm-6: Disabling barriers, trial barrier write failed
Apr 2 10:24:08 node24 kernel: XFS mounting filesystem dm-6
Apr 2 10:24:08 node24 kernel: Starting XFS recovery on filesystem: dm-6 (logdev: internal)
Apr 2 10:24:10 node24 kernel: Ending XFS recovery on filesystem: dm-6 (logdev: internal)
Apr 2 10:24:17 node24 multipathd: dm-6: umount map (uevent)
Apr 2 10:58:54 node24 kernel: Filesystem dm-6: Disabling barriers, trial barrier write failed
Apr 2 10:58:54 node24 kernel: XFS mounting filesystem dm-6
-- snip --
We are taking 6 devices from a SAN and using LVM to effectively create a RAID0 block devices which XFS is sitting on. We do not see any multipathd errors.
I created the filesystem using this command.
# mkfs.xfs -f -d su=256k,sw=6,sectsize=4096,unwritten=0 -i attr=2 -l sectsize=4096,lazy-count=1 -r extsize=4096 /dev/mapper/vol_d24-root
Here are the mount options:
# cat /etc/fstab | grep xfs
/dev/mapper/vol_d24-root /archive/d24 xfs defaults,inode64 0 9
# mount | grep xfs
/dev/mapper/vol_d24-root on /archive/d24 type xfs (rw,inode64)
Here is the output of xfs_info:
# xfs_info /dev/mapper/vol_d24-root
meta-data="" isize=256 agcount=88, agsize=268435392 blks
= sectsz=4096 attr=2
data = bsize=4096 blocks=23441774592, imaxpct=25
= sunit=64 swidth=384 blks, unwritten=0
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
After the initial off-line event I:
- umount
- ran xfs_repair (it told me to mount/umount and then re-run xfs_repair)
- mount
- umount
- xfs_repair
Here is the output of xfs_repair:
-- snip --
# xfs_repair /dev/mapper/vol_d24-root
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
2acde2416940: Badness in key lookup (length)
bp=(bno 14657493984, len 16384 bytes) key=(bno 14657493984, len 8192 bytes)
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
2acde2416940: Badness in key lookup (length)
bp=(bno 26065183200, len 16384 bytes) key=(bno 26065183200, len 8192 bytes)
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
2acde2e17940: Badness in key lookup (length)
bp=(bno 43039175488, len 16384 bytes) key=(bno 43039175488, len 8192 bytes)
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- agno = 37
- agno = 38
- agno = 39
- agno = 40
- agno = 41
- agno = 42
- agno = 43
- agno = 44
- agno = 45
- agno = 46
- agno = 47
2acde0613940: Badness in key lookup (length)
bp=(bno 101051527232, len 16384 bytes) key=(bno 101051527232, len 8192 bytes)
2acde0613940: Badness in key lookup (length)
bp=(bno 101081120768, len 16384 bytes) key=(bno 101081120768, len 8192 bytes)
2acde0613940: Badness in key lookup (length)
bp=(bno 102336613216, len 16384 bytes) key=(bno 102336613216, len 8192 bytes)
- agno = 48
- agno = 49
2acde2416940: Badness in key lookup (length)
bp=(bno 107185599392, len 16384 bytes) key=(bno 107185599392, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107606543312, len 16384 bytes) key=(bno 107606543312, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107674994560, len 16384 bytes) key=(bno 107674994560, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107675078656, len 16384 bytes) key=(bno 107675078656, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107675078688, len 16384 bytes) key=(bno 107675078688, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107675078720, len 16384 bytes) key=(bno 107675078720, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107675175008, len 16384 bytes) key=(bno 107675175008, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107704942624, len 16384 bytes) key=(bno 107704942624, len 8192 bytes)
2acde1014940: Badness in key lookup (length)
bp=(bno 107763211904, len 16384 bytes) key=(bno 107763211904, len 8192 bytes)
- agno = 50
2acde1014940: Badness in key lookup (length)
bp=(bno 109436122656, len 16384 bytes) key=(bno 109436122656, len 8192 bytes)
2acde2e17940: Badness in key lookup (length)
bp=(bno 110466056352, len 16384 bytes) key=(bno 110466056352, len 8192 bytes)
2acde2e17940: Badness in key lookup (length)
bp=(bno 110603835392, len 16384 bytes) key=(bno 110603835392, len 8192 bytes)
- agno = 51
- agno = 52
- agno = 53
- agno = 54
- agno = 55
- agno = 56
- agno = 57
- agno = 58
- agno = 59
- agno = 60
- agno = 61
2acde2416940: Badness in key lookup (length)
bp=(bno 132435472416, len 16384 bytes) key=(bno 132435472416, len 8192 bytes)
- agno = 62
2acde2416940: Badness in key lookup (length)
bp=(bno 135330780000, len 16384 bytes) key=(bno 135330780000, len 8192 bytes)
2acde2416940: Badness in key lookup (length)
bp=(bno 135508074496, len 16384 bytes) key=(bno 135508074496, len 8192 bytes)
2acde2416940: Badness in key lookup (length)
bp=(bno 135675982432, len 16384 bytes) key=(bno 135675982432, len 8192 bytes)
- agno = 63
- agno = 64
- agno = 65
- agno = 66
- agno = 67
- agno = 68
- agno = 69
- agno = 70
- agno = 71
- agno = 72
- agno = 73
- agno = 74
- agno = 75
- agno = 76
- agno = 77
- agno = 78
- agno = 79
- agno = 80
- agno = 81
- agno = 82
- agno = 83
- agno = 84
- agno = 85
- agno = 86
- agno = 87
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 5
- agno = 6
- agno = 7
- agno = 2
- agno = 3
- agno = 8
- agno = 9
- agno = 4
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 19
- agno = 20
- agno = 18
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- agno = 37
- agno = 38
- agno = 39
- agno = 40
- agno = 41
- agno = 42
- agno = 43
- agno = 44
- agno = 45
- agno = 46
- agno = 47
- agno = 48
- agno = 49
- agno = 50
- agno = 51
- agno = 52
- agno = 53
- agno = 54
- agno = 55
- agno = 56
- agno = 57
- agno = 58
- agno = 59
- agno = 60
- agno = 61
- agno = 62
- agno = 63
- agno = 64
- agno = 65
- agno = 66
- agno = 67
- agno = 68
- agno = 69
- agno = 70
- agno = 71
- agno = 72
- agno = 73
- agno = 74
- agno = 75
- agno = 76
- agno = 77
- agno = 78
- agno = 79
- agno = 80
- agno = 81
- agno = 82
- agno = 83
- agno = 84
- agno = 85
- agno = 86
- agno = 87
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 202102936036, moving to lost+found
disconnected inode 215350040250, moving to lost+found
disconnected inode 215350208634, moving to lost+found
disconnected inode 271016406074, moving to lost+found
Phase 7 - verify and correct link counts...
done
-- snip --
Any ideas?
Thanks
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs