I have a 24T XFS file system that is very sick, and seemingly getting sicker. I believe it to be the file system itself. I have replaced the RAID chassis, the OS, the cables, the drive controller, and most of the drives. Re-syncing the RAID array complete in a reasonable time, given the size of the array, and reports no mismatches. Xfs_repair completes, usually with no errors found, or sometimes one or two errors. Some commands, like a df, are now hanging. Writes are often failing with I/O errors. I haven't found any amount of obvious file corruption, but performing a CRC check using md5sum, md6sum, sha256sum, etc., come up with different values every time they are run on many large files. What can I do to try to rectify this? Kernel: 3.16.0-4-amd64 Xfsprogs: 3.2.3 8 CPUs /proc/meminfo: MemTotal: 8095952 kB MemFree: 7005032 kB MemAvailable: 7393072 kB Buffers: 201804 kB Cached: 310752 kB SwapCached: 0 kB Active: 637704 kB Inactive: 132232 kB Active(anon): 258320 kB Inactive(anon): 3888 kB Active(file): 379384 kB Inactive(file): 128344 kB Unevictable: 0 kB Mlocked: 4 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 40 kB Writeback: 0 kB AnonPages: 257376 kB Mapped: 121392 kB Shmem: 4824 kB Slab: 141708 kB SReclaimable: 98512 kB SUnreclaim: 43196 kB KernelStack: 5072 kB PageTables: 18832 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4047976 kB Committed_AS: 1189596 kB VmallocTotal: 34359738367 kB VmallocUsed: 366160 kB VmallocChunk: 34359349248 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 88660 kB DirectMap2M: 4003840 kB DirectMap1G: 4194304 kB /proc/mounts: rootfs / rootfs rw 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 udev /dev devtmpfs rw,relatime,size=10240k,nr_inodes=1001559,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=809596k,mode=755 0 0 /dev/sdd2 / ext4 rw,noatime,errors=remount-ro,data=ordered 0 0 tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 pstore /sys/fs/pstore pstore rw,relatime 0 0 tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=1619180k 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 /dev/sdd1 /boot ext2 rw,noatime 0 0 tmpfs /var/www/vidmgr/artwork tmpfs rw,relatime,size=16384k 0 0 /dev/md2 /OldDrive ext4 rw,relatime,data=ordered 0 0 rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0 Backup:/var/www /var/www/backup nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.51,mountvers=3,mountport=49438,mountproto=tcp,local_lock=none,addr=192.168.1.51 0 0 cgroup /sys/fs/cgroup tmpfs rw,relatime,size=12k 0 0 cgmfs /run/cgmanager/fs tmpfs rw,relatime,size=100k,mode=755 0 0 nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 systemd /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/x86_64-linux-gnu/systemd-shim-cgroup-release-agent,name=systemd 0 0 tmpfs /run/user/0 tmpfs rw,nosuid,nodev,relatime,size=809596k,mode=700 0 0 Backup:/Backup /Backup nfs rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.51,mountvers=3,mountport=57420,mountproto=tcp,local_lock=none,addr=192.168.1.51 0 0 /dev/md0 /RAID xfs rw,relatime,attr2,inode64,sunit=2048,swidth=12288,noquota 0 0 /proc/partitions: major minor #blocks name 8 0 125034840 sda 8 1 96256 sda1 8 2 112305152 sda2 8 3 12632064 sda3 8 16 125034840 sdb 8 17 96256 sdb1 8 18 112305152 sdb2 8 19 12632064 sdb3 8 32 3907018584 sdc 9 1 96128 md1 9 3 12623872 md3 9 2 112239616 md2 11 0 1048575 sr0 8 48 488386584 sdd 8 49 96256 sdd1 8 50 112305152 sdd2 8 51 12632064 sdd3 9 0 23441319936 md0 8 64 4883770584 sde 8 80 4883770584 sdf 8 96 3907018584 sdg 8 112 4883770584 sdh 8 128 4883770584 sdi 8 144 3907018584 sdj 8 160 3907018584 sdk mdadm -D /dev/md0: /dev/md0: Version : 1.2 Creation Time : Fri Oct 3 20:06:55 2014 Raid Level : raid6 Array Size : 23441319936 (22355.39 GiB 24003.91 GB) Used Dev Size : 3906886656 (3725.90 GiB 4000.65 GB) Raid Devices : 8 Total Devices : 8 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Jul 17 19:47:45 2015 State : clean Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 1024K Name : RAID-Server:0 (local to host RAID-Server) UUID : d26e92db:8bd207bb:db9bec69:4117ed57 Events : 698300 Number Major Minor RaidDevice State 10 8 128 0 active sync /dev/sdi 12 8 112 1 active sync /dev/sdh 8 8 80 2 active sync /dev/sdf 9 8 64 3 active sync /dev/sde 11 8 96 4 active sync /dev/sdg 5 8 32 5 active sync /dev/sdc 6 8 160 6 active sync /dev/sdk 7 8 144 7 active sync /dev/sdj No LVM 8 SATA disks, various ,manufacturers, 4 & 5T dmesg is un markable prior to echo w > /proc/sysrq-trigger: [112915.907065] md: md0: requested-resync done. [134859.522323] XFS (md0): Mounting V4 Filesystem [134860.767122] XFS (md0): Ending clean mount [135019.548703] XFS (md0): Mounting V4 Filesystem [135019.817854] XFS (md0): Ending clean mount Xfs_info: meta-data=/dev/md0 isize=256 agcount=32, agsize=183135488 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 data = bsize=4096 blocks=5860329984, imaxpct=5 = sunit=256 swidth=1536 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 After echo w > /proc/sysrq-trigger: http://fletchergeek.com/images/dmesg.txt _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs